Having setup recursive DNS it was time to actually sort out a backup internet connection. I live in a Virgin Media area, but I still haven t forgiven them for my terrible Virgin experiences when moving here. Plus it involves a bigger contractual commitment. There are no altnets locally (though I m watching youfibre who have already rolled out in a few Belfast exchanges), so I decided to go for a 5G modem. That gives some flexibility, and is a bit easier to get up and running. I started by purchasing a ZTE MC7010. This had the advantage of being reasonably cheap off eBay, not having any wifi functionality I would just have to disable (it s going to plug it into the same router the FTTP connection terminates on), being outdoor mountable should I decide to go that way, and, finally, being powered via PoE. For now this device sits on the window sill in my study, which is at the top of the house. I printed a table stand for it which mostly does the job (though not as well with a normal, rather than flat, network cable). The router lives downstairs, so I ve extended a dedicated VLAN through the study switch, down to the core switch and out to the router. The PoE study switch can only do GigE, not 2.5Gb/s, but at present that s far from the limiting factor on the speed of the connection. The device is 3 branded, and, as it happens, I ve ended up with a 3 SIM in it. Up until recently my personal phone was with them, but they ve kicked me off Go Roam, so I ve moved. Going with 3 for the backup connection provides some slight extra measure of resiliency; we now have devices on all 4 major UK networks in the house. The SIM is a preloaded data only SIM good for a year; I don t expect to use all of the data allowance, but I didn t want to have to worry about unexpected excess charges. Performance turns out to be disappointing; I end up locking the device to 4G as the 5G signal is marginal - leaving it enabled results in constantly switching between 4G + 5G and a significant extra latency. The smokeping graph below shows a brief period where I removed the 4G lock and allowed 5G: Smokeping 4G vs 5G graph

(There s a handy zte.js script to allow doing this from the device web interface.) I get about 10Mb/s sustained downloads out of it. EE/Vodafone did not lead to significantly better results, so for now I m accepting it is what it is. I tried relocating the device to another part of the house (a little tricky while still providing switch-based PoE, but I have an injector), without much improvement. Equally pinning the 4G to certain bands provided a short term improvement (I got up to 40-50Mb/s sustained), but not reliably so. speedtest.net results

This is disappointing, but if it turns out to be a problem I can look at mounting it externally. I also assume as 5G is gradually rolled out further things will naturally improve, but that might be wishful thinking on my part. Rather than wait until my main link had a problem I decided to try a day working over the 5G connection. I spend a lot of my time either in browser based apps or accessing remote systems via SSH, so I m reasonably sensitive to a jittery or otherwise flaky connection. I picked a day that I did not have any meetings planned, but as it happened I ended up with an adhoc video call arranged. I m pleased to say that it all worked just fine; definitely noticeable as slower than the FTTP connection (to be expected), but all workable and even the video call was fine (at least from my end). Looking at the traffic graph shows the expected ~ 10Mb/s peak (actually a little higher, and looking at the FTTP stats for previous days not out of keeping with what we see there), and you can just about see the ~ 3Mb/s symmetric use by the video call at 2pm: 4G traffic during the work day

The test run also helped iron out the fact that the content filter was still enabled on the SIM, but that was easily resolved. Up next, vaguely automatic failover.

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are reproducible on several architectures. What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work. Let s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:

git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz

The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980 s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened. The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs. The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like. Publishing minimal source-only tarballs enable easier auditing of a project s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover. With all that discussion and rationale out of the way, let s return to the release process. I have added another step here:

make srcdist
gpg -b libntlm-1.8-src.tar.gz

Now the release is ready. I publish these four files in the Libntlm s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:

91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz

So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:

podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz

You should see the exact same SHA256 checksum values. Hooray! This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed. Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:

podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz

The source-only minimal tarball can be regenerated on Debian 11:

podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz

As the Magnus Opus or chef-d uvre, let s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.

podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz

Yay! You should now have great confidence in that the release artifacts correspond to what s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts. I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot. Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the v in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion. Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs even though the content would. Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm s ChangeLog file died on the surgery table here. So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team. Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work see my notes in gnulib s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately. One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn t require the git tool to extract the necessary files, but none that I found practical ideas welcome! Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf. The srcdist make rule is simple:

git archive --prefix=libntlm-v1.8/ -o libntlm-v1.8.tar.gz HEAD

Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented. Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property. Doing continous testing of all this is critical to make sure things don t regress. Libntlm s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa. If this way of working plays out well, I hope to implement it in other projects too. What do you think? Happy Hacking!

Getting the Belgian eID to work on Linux systems should be fairly easy, although some people do struggle with it. For that reason, there is a lot of third-party documentation out there in the form of blog posts, wiki pages, and other kinds of things. Unfortunately, some of this documentation is simply wrong. Written by people who played around with things until it kind of worked, sometimes you get a situation where something that used to work in the past (but wasn't really necessary) now stopped working, but it's still added to a number of locations as though it were the gospel. And then people follow these instructions and now things don't work anymore. One of these revolves around OpenSC. OpenSC is an open source smartcard library that has support for a pretty large number of smartcards, amongst which the Belgian eID. It provides a PKCS#11 module as well as a number of supporting tools. For those not in the know, PKCS#11 is a standardized C API for offloading cryptographic operations. It is an API that can be used when talking to a hardware cryptographic module, in order to make that module perform some actions, and it is especially popular in the open source world, with support in NSS, amongst others. This library is written and maintained by mozilla, and is a low-level cryptographic library that is used by Firefox (on all platforms it supports) as well as by Google Chrome and other browsers based on that (but only on Linux, and as I understand it, only for linking with smartcards; their BoringSSL library is used for other things). The official eID software that we ship through eid.belgium.be, also known as "BeID", provides a PKCS#11 module for the Belgian eID, as well as a number of support tools to make interacting with the card easier, such as the "eID viewer", which provides the ability to read data from the card, and validate their signatures. While the very first public version of this eID PKCS#11 module was originally based on OpenSC, it has since been reimplemented as a PKCS#11 module in its own right, with no lineage to OpenSC whatsoever anymore. About five years ago, the Belgian eID card was renewed. At the time, a new physical appearance was the most obvious difference with the old card, but there were also some technical, on-chip, differences that are not so apparent. The most important one here, although it is not the only one, is the fact that newer eID cards now use a NIST P-384 elliptic curve-based private keys, rather than the RSA-based ones that were used in the past. This change required some changes to any PKCS#11 module that supports the eID; both the BeID one, as well as the OpenSC card-belpic driver that is written in support of the Belgian eID. Obviously, the required changes were implemented for the BeID module; however, the OpenSC card-belpic driver was not updated. While I did do some preliminary work on the required changes, I was unable to get it to work, and eventually other things took up my time so I never finished the implementation. If someone would like to finish the work that I started, the preliminal patch that I wrote could be a good start -- but like I said, it doesn't yet work. Also, you'll probably be interested in the official documentation of the eID card. Unfortunately, in the mean time someone added the Applet 1.8 ATR to the card-belpic.c file, without also implementing the required changes to the driver so that the PKCS#11 driver actually supports the eID card. The result of this is that if you have OpenSC installed in NSS for either Firefox or any Chromium-based browser, and it gets picked up before the BeID PKCS#11 module, then NSS will stop looking and pass all crypto operations to the OpenSC PKCS#11 module rather than to the official eID PKCS#11 module, and things will not work at all, causing a lot of confusion. I have therefore taken the following two steps:

The official eID packages now conflict with the OpenSC PKCS#11 module. Specifically only the PKCS#11 module, not the rest of OpenSC, so you can theoretically still use its tools. This means that once we release this new version of the eID software, when you do an upgrade and you have OpenSC installed, it will remove the PKCS#11 module and anything that depends on it. This is normal and expected.
I have filed a pull request against OpenSC that removes the Applet 1.8 ATR from the driver, so that OpenSC will stop claiming that it supports the 1.8 applet.

When the pull request is accepted, we will update the official eID software to make the conflict versioned, so that as soon as it works again you will again be able to install the OpenSC and BeID packages at the same time. In the mean time, if you have the OpenSC PKCS#11 module installed on your system, and your eID authentication does not work, try removing it.

In light of the recent cancellation of the queer.af domain registration by the Taliban, the fragile and difficult nature of country-code top-level domains (ccTLDs) has once again been comprehensively demonstrated. Since many people may not be aware of the risks, I thought I d give a solid explainer of the whole situation, and explain why you should, in general, not have anything to do with domains which are registered under ccTLDs.

Top-level What-Now? A top-level domain (TLD) is the last part of a domain name (the collection of words, separated by periods, after the `https://` in your web browser s location bar). It s the com in `example.com`, or the af in `queer.af`. There are two kinds of TLDs: country-code TLDs (ccTLDs) and generic TLDs (gTLDs). Despite all being TLDs, they re very different beasts under the hood.

What s the Difference? Generic TLDs are what most organisations and individuals register their domains under: old-school technobabble like com , net , or org , historical oddities like gov , and the new-fangled world of words like tech , social , and bank . These gTLDs are all regulated under a set of rules created and administered by ICANN (the Internet Corporation for Assigned Names and Numbers ), which try to ensure that things aren t a complete wild-west, limiting things like price hikes (well, sometimes, anyway), and providing means for disputes over names¹. Country-code TLDs, in contrast, are all two letters long², and are given out to countries to do with as they please. While ICANN kinda-sorta has something to do with ccTLDs (in the sense that it makes them exist on the Internet), it has no authority to control how a ccTLD is managed. If a country decides to raise prices by 100x, or cancel all registrations that were made on the 12th of the month, there s nothing anyone can do about it. If that sounds bad, that s because it is. Also, it s not a theoretical problem the Taliban deciding to asssert its bigotry over the little corner of the Internet namespace it has taken control of is far from the first time that ccTLDs have caused grief.

Shifting Sands The `queer.af` cancellation is interesting because, at the time the domain was reportedly registered, 2018, Afghanistan had what one might describe as, at least, a different political climate. Since then, of course, things have changed, and the new bosses have decided to get a bit more active. Those running `queer.af` seem to have seen the writing on the wall, and were planning on moving to another, less fraught, domain, but hadn t completed that move when the Taliban came knocking.

The Curious Case of Brexit When the United Kingdom decided to leave the European Union, it fell foul of the EU s rules for the registration of domains under the eu ccTLD³. To register (and maintain) a domain name ending in `.eu`, you have to be a resident of the EU. When the UK ceased to be part of the EU, residents of the UK were no longer EU residents. Cue much unhappiness, wailing, and gnashing of teeth when this was pointed out to Britons. Some decided to give up their domains, and move to other parts of the Internet, while others managed to hold onto them by various legal sleight-of-hand (like having an EU company maintain the registration on their behalf). In any event, all very unpleasant for everyone involved.

Geopolitics on the Internet?!? After Russia invaded Ukraine in February 2022, the Ukranian Vice Prime Minister asked ICANN to suspend ccTLDs associated with Russia. While ICANN said that it wasn t going to do that, because it wouldn t do anything useful, some domain registrars (the companies you pay to register domain names) ceased to deal in Russian ccTLDs, and some websites restricted links to domains with Russian ccTLDs. Whether or not you agree with the sort of activism implied by these actions, the fact remains that even the actions of a government that aren t directly related to the Internet can have grave consequences for your domain name if it s registered under a ccTLD. I don t think any gTLD operator will be invading a neighbouring country any time soon.

Money, Money, Money, Must Be Funny When you register a domain name, you pay a registration fee to a registrar, who does administrative gubbins and causes you to be able to control the domain name in the DNS. However, you don t own that domain name⁴ you re only renting it. When the registration period comes to an end, you have to renew the domain name, or you ll cease to be able to control it. Given that a domain name is typically your brand or identity online, the chances are you d prefer to keep it over time, because moving to a new domain name is a massive pain, having to tell all your customers or users that now you re somewhere else, plus having to accept the risk of someone registering the domain name you used to have and capturing your traffic it s all a gigantic hassle. For gTLDs, ICANN has various rules around price increases and bait-and-switch pricing that tries to keep a lid on the worst excesses of registries. While there are any number of reasonable criticisms of the rules, and the Internet community has to stay on their toes to keep ICANN from totally succumbing to regulatory capture, at least in the gTLD space there s some degree of control over price gouging. On the other hand, ccTLDs have no effective controls over their pricing. For example, in 2008 the Seychelles increased the price of `.sc` domain names from US$25 to US$75. No reason, no warning, just pay up .

Who Is Even Getting That Money? A closely related concern about ccTLDs is that some of the cool ones are assigned to countries that are not great. The poster child for this is almost certainly Libya, which has the ccTLD ly . While Libya was being run by a terrorist-supporting extremist, companies thought it was a great idea to have domain names that ended in `.ly`. These domain registrations weren t (and aren t) cheap, and it s hard to imagine that at least some of that money wasn t going to benefit the Gaddafi regime. Similarly, the British Indian Ocean Territory, which has the io ccTLD, was created in a colonialist piece of chicanery that expelled thousands of native Chagossians from Diego Garcia. Money from the registration of `.io` domains doesn t go to the (former) residents of the Chagos islands, instead it gets paid to the UK government. Again, I m not trying to suggest that all gTLD operators are wonderful people, but it s not particularly likely that the direct beneficiaries of the operation of a gTLD stole an island chain and evicted the residents.

Are ccTLDs Ever Useful? The answer to that question is an unqualified maybe . I certainly don t think it s a good idea to register a domain under a ccTLD for vanity purposes: because it makes a word, is the same as a file extension you like, or because it looks cool. Those ccTLDs that clearly represent and are associated with a particular country are more likely to be OK, because there is less impetus for the registry to try a naked cash grab. Unfortunately, ccTLD registries have a disconcerting habit of changing their minds on whether they serve their geographic locality, such as when auDA decided to declare an open season in the `.au` namespace some years ago. Essentially, while a ccTLD may have geographic connotations now, there s not a lot of guarantee that they won t fall victim to scope creep in the future. Finally, it might be somewhat safer to register under a ccTLD if you live in the location involved. At least then you might have a better idea of whether your domain is likely to get pulled out from underneath you. Unfortunately, as the `.eu` example shows, living somewhere today is no guarantee you ll still be living there tomorrow, even if you don t move house. In short, I d suggest sticking to gTLDs. They re at least lower risk than ccTLDs.

+1, Helpful If you ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.

Footnotes

don t make the mistake of thinking that I approve of ICANN or how it operates; it s an omnishambles of poor governance and incomprehensible decision-making.

corresponding roughly, though not precisely (because everything has to be complicated, because humans are complicated), to the entries in the ISO standard for Codes for the representation of names of countries and their subdivisions , ISO 3166.

yes, the EU is not a country; it s part of the roughly, though not precisely caveat mentioned previously.

despite what domain registrars try very hard to imply, without falling foul of deceptive advertising regulations.

This post is in part a response to an aspect of Nate s post Does Wayland really break everything? , but also my reflection on discussing Wayland protocol additions, a unique pleasure that I have been involved with for the past months¹.

Some facts Before I start I want to make a few things clear: The Linux desktop will be moving to Wayland² this is a fact at this point (and has been for a while), sticking to X11 makes no sense for future projects. From reading Wayland protocols and working with it at a much lower level than I ever wanted to, it is also very clear to me that Wayland is an exceptionally well-designed core protocol, and so are the additional extension protocols (xdg-shell & Co.). The modularity of Wayland is great, it gives it incredible flexibility and will for sure turn out to be good for the long-term viability of this project (and also provides a path to correct protocol issues in future, if one is found). In other words: Wayland is an amazing foundation to build on, and a lot of its design decisions make a lot of sense! The shift towards people seeing Linux more as an application developer platform, and taking PipeWire and XDG Portals into account when designing for Wayland is also an amazing development and I love to see this this holistic approach is something I always wanted! Furthermore, I think Wayland removes a lot of functionality that shouldn t exist in a modern compositor and that s a good thing too! Some of X11 s features and design decisions had clear drawbacks that we shouldn t replicate. I highly recommend to read Nate s blog post, it s very good and goes into more detail. And due to all of this, I firmly believe that any advancement in the Wayland space must come from within the project.

But! But! Of course there was a but coming I think while developing Wayland-as-an-ecosystem we are now entrenched into narrow concepts of how a desktop should work. While discussing Wayland protocol additions, a lot of concepts clash, people from different desktops with different design philosophies debate the merits of those over and over again never reaching any conclusion (just as you will never get an answer out of humans whether sushi or pizza is the clearly superior food, or whether CSD or SSD is better). Some people want to use Wayland as a vehicle to force applications to submit to their desktop s design philosophies, others prefer the smallest and leanest protocol possible, other developers want the most elegant behavior possible. To be clear, I think those are all very valid approaches. But this also creates problems: By switching to Wayland compositors, we are already forcing a lot of porting work onto toolkit developers and application developers. This is annoying, but just work that has to be done. It becomes frustrating though if Wayland provides toolkits with absolutely no way to reach their goal in any reasonable way. For Nate s Photoshop analogy: Of course Linux does not break Photoshop, it is Adobe s responsibility to port it. But what if Linux was missing a crucial syscall that Photoshop needed for proper functionality and Adobe couldn t port it without that? In that case it becomes much less clear on who is to blame for Photoshop not being available. A lot of Wayland protocol work is focused on the environment and design, while applications and work to port them often is considered less. I think this happens because the overlap between application developers and developers of the desktop environments is not necessarily large, and the overlap with people willing to engage with Wayland upstream is even smaller. The combination of Windows developers porting apps to Linux and having involvement with toolkits or Wayland is pretty much nonexistent. So they have less of a voice.

A quick detour through the neuroscience research lab I have been involved with Freedesktop, GNOME and KDE for an incredibly long time now (more than a decade), but my actual job (besides consulting for Purism) is that of a PhD candidate in a neuroscience research lab (working on the morphology of biological neurons and its relation to behavior). I am mostly involved with three research groups in our institute, which is about 35 people. Most of us do all our data analysis on powerful servers which we connect to using RDP (with KDE Plasma as desktop). Since I joined, I have been pushing the envelope a bit to extend Linux usage to data acquisition and regular clients, and to have our data acquisition hardware interface well with it. Linux brings some unique advantages for use in research, besides the obvious one of having every step of your data management platform introspectable with no black boxes left, a goal I value very highly in research (but this would be its own blogpost). In terms of operating system usage though, most systems are still Windows-based. Windows is what companies develop for, and what people use by default and are familiar with. The choice of operating system is very strongly driven by application availability, and WSL being really good makes this somewhat worse, as it removes the need for people to switch to a real Linux system entirely if there is the occasional software requiring it. Yet, we have a lot more Linux users than before, and use it in many places where it makes sense. I also developed a novel data acquisition software that even runs on Linux-only and uses the abilities of the platform to its fullest extent. All of this resulted in me asking existing software and hardware vendors for Linux support a lot more often. Vendor-customer relationship in science is usually pretty good, and vendors do usually want to help out. Same for open source projects, especially if you offer to do Linux porting work for them But overall, the ease of use and availability of required applications and their usability rules supreme. Most people are not technically knowledgeable and just want to get their research done in the best way possible, getting the best results with the least amount of friction.
KDE/Linux usage at a control station for a particle accelerator at Adlershof Technology Park, Germany, for reference (by 25years of KDE)³

Back to the point The point of that story is this: GNOME, KDE, RHEL, Debian or Ubuntu: They all do not matter if the necessary applications are not available for them. And as soon as they are, the easiest-to-use solution wins. There are many facets of easiest : In many cases this is RHEL due to Red Hat support contracts being available, in many other cases it is Ubuntu due to its mindshare and ease of use. KDE Plasma is also frequently seen, as it is perceived a bit easier to onboard Windows users with it (among other benefits). Ultimately, it comes down to applications and 3rd-party support though. Here s a dirty secret: In many cases, porting an application to Linux is not that difficult. The thing that companies (and FLOSS projects too!) struggle with and will calculate the merits of carefully in advance is whether it is worth the support cost as well as continuous QA/testing. Their staff will have to do all of that work, and they could spend that time on other tasks after all. So if they learn that porting to Linux not only means added testing and support, but also means to choose between the legacy X11 display server that allows for 1:1 porting from Windows or the new Wayland compositors that do not support the same features they need, they will quickly consider it not worth the effort at all. I have seen this happen. Of course many apps use a cross-platform toolkit like Qt, which greatly simplifies porting. But this just moves the issue one layer down, as now the toolkit needs to abstract Windows, macOS and Wayland. And Wayland does not contain features to do certain things or does them very differently from e.g. Windows, so toolkits have no way to actually implement the existing functionality in a way that works on all platforms. So in Qt s documentation you will often find texts like works everywhere except for on Wayland compositors or mobile ⁴. Many missing bits or altered behavior are just papercuts, but those add up. And if users will have a worse experience, this will translate to more support work, or people not wanting to use the software on the respective platform.

What s missing?

Window positioning SDI applications with multiple windows are very popular in the scientific world. For data acquisition (for example with microscopes) we often have one monitor with control elements and one larger one with the recorded image. There is also other configurations where multiple signal modalities are acquired, and the experimenter aligns windows exactly in the way they want and expects the layout to be stored and to be loaded upon reopening the application. Even in the image from Adlershof Technology Park above you can see this style of UI design, at mega-scale. Being able to pop-out elements as windows from a single-window application to move them around freely is another frequently used paradigm, and immensely useful with these complex apps. It is important to note that this is not a legacy design, but in many cases an intentional choice these kinds of apps work incredibly well on larger screens or many screens and are very flexible (you can have any window configuration you want, and switch between them using the (usually) great window management abilities of your desktop). Of course, these apps will work terribly on tablets and small form factors, but that is not the purpose they were designed for and nobody would use them that way. I assumed for sure these features would be implemented at some point, but when it became clear that that would not happen, I created the ext-placement protocol which had some good discussion but was ultimately rejected from the `xdg` namespace. I then tried another solution based on feedback, which turned out not to work for most apps, and now proposed xdg-placement (v2) in an attempt to maybe still get some protocol done that we can agree on, exploring more options before pushing the existing protocol for inclusion into the `ext` Wayland protocol namespace. Meanwhile though, we can not port any application that needs this feature, while at the same time we are switching desktops and distributions to Wayland by default.

Window position restoration Similarly, a protocol to save & restore window positions was already proposed in 2018, 6 years ago now, but it has still not been agreed upon, and may not even help multiwindow apps in its current form. The absence of this protocol means that applications can not restore their former window positions, and the user has to move them to their previous place again and again. Meanwhile, toolkits can not adopt these protocols and applications can not use them and can not be ported to Wayland without introducing papercuts.

Window icons Similarly, individual windows can not set their own icons, and not-installed applications can not have an icon at all because there is no desktop-entry file to load the icon from and no icon in the theme for them. You would think this is a niche issue, but for applications that create many windows, providing icons for them so the user can find them is fairly important. Of course it s not the end of the world if every window has the same icon, but it s one of those papercuts that make the software slightly less user-friendly. Even applications with fewer windows like LibrePCB are affected, so much so that they rather run their app through Xwayland for now. I decided to address this after I was working on data analysis of image data in a Python virtualenv, where my code and the Python libraries used created lots of windows all with the default yellow W icon, making it impossible to distinguish them at a glance. This is xdg-toplevel-icon now, but of course it is an uphill battle where the very premise of needing this is questioned. So applications can not use it yet.

Limited window abilities requiring specialized protocols Firefox has a picture-in-picture feature, allowing it to pop out media from a mediaplayer as separate floating window so the user can watch the media while doing other things. On X11 this is easily realized, but on Wayland the restrictions posed on windows necessitate a different solution. The xdg-pip protocol was proposed for this specialized usecase, but it is also not merged yet. So this feature does not work as well on Wayland.

Automated GUI testing / accessibility / automation Automation of GUI tasks is a powerful feature, so is the ability to auto-test GUIs. This is being worked on, with libei and wlheadless-run (and stuff like ydotool exists too), but we re not fully there yet.

Wayland is frustrating for (some) application authors As you see, there is valid applications and valid usecases that can not be ported yet to Wayland with the same feature range they enjoyed on X11, Windows or macOS. So, from an application author s perspective, Wayland does break things quite significantly, because things that worked before can no longer work and Wayland (the whole stack) does not provide any avenue to achieve the same result. Wayland does break screen sharing, global hotkeys, gaming latency (via no tearing ) etc, however for all of these there are solutions available that application authors can port to. And most developers will gladly do that work, especially since the newer APIs are usually a lot better and more robust. But if you give application authors no path forward except use Xwayland and be on emulation as second-class citizen forever , it just results in very frustrated application developers. For some application developers, switching to a Wayland compositor is like buying a canvas from the Linux shop that forces your brush to only draw triangles. But maybe for your avant-garde art, you need to draw a circle. You can approximate one with triangles, but it will never be as good as the artwork of your friends who got their canvases from the Windows or macOS art supply shop and have more freedom to create their art.

Triangles are proven to be the best shape! If you are drawing circles you are creating bad art! Wayland, via its protocol limitations, forces a certain way to build application UX often for the better, but also sometimes to the detriment of users and applications. The protocols are often fairly opinionated, a result of the lessons learned from X11. In any case though, it is the odd one out Windows and macOS do not pose the same limitations (for better or worse!), and the effort to port to Wayland is orders of magnitude bigger, or sometimes in case of the multiwindow UI paradigm impossible to achieve to the same level of polish. Desktop environments of course have a design philosophy that they want to push, and want applications to integrate as much as possible (same as macOS and Windows!). However, there are many applications out there, and pushing a design via protocol limitations will likely just result in fewer apps.

The porting dilemma I spent probably way too much time looking into how to get applications cross-platform and running on Linux, often talking to vendors (FLOSS and proprietary) as well. Wayland limitations aren t the biggest issue by far, but they do start to come come up now, especially in the scientific space with Ubuntu having switched to Wayland by default. For application authors there is often no way to address these issues. Many scientists do not even understand why their Python script that creates some GUIs suddenly behaves weirdly because Qt is now using the Wayland backend on Ubuntu instead of X11. They do not know the difference and also do not want to deal with these details even though they may be programmers as well, the real goal is not to fiddle with the display server, but to get to a scientific result somehow. Another issue is portability layers like Wine which need to run Windows applications as-is on Wayland. Apparently Wine s Wayland driver has some heuristics to make window positioning work (and I am amazed by the work done on this!), but that can only go so far.

A way out? So, how would we actually solve this? Fundamentally, this excessively long blog post boils down to just one essential question: Do we want to force applications to submit to a UX paradigm unconditionally, potentially loosing out on application ports or keeping apps on X11 eternally, or do we want to throw them some rope to get as many applications ported over to Wayland, even through we might sacrifice some protocol purity? I think we really have to answer that to make the discussions on wayland-protocols a lot less grueling. This question can be answered at the wayland-protocols level, but even more so it must be answered by the individual desktops and compositors. If the answer for your environment turns out to be Yes, we want the Wayland protocol to be more opinionated and will not make any compromises for application portability , then your desktop/compositor should just immediately NACK protocols that add something like this and you simply shouldn t engage in the discussion, as you reject the very premise of the new protocol: That it has any merit to exist and is needed in the first place. In this case contributors to Wayland and application authors also know where you stand, and a lot of debate is skipped. Of course, if application authors want to support your environment, you are basically asking them now to rewrite their UI, which they may or may not do. But at least they know what to expect and how to target your environment. If the answer turns out to be We do want some portability , the next question obviously becomes where the line should be drawn and which changes are acceptable and which aren t. We can t blindly copy all X11 behavior, some porting work to Wayland is simply inevitable. Some written rules for that might be nice, but probably more importantly, if you agree fundamentally that there is an issue to be fixed, please engage in the discussions for the respective MRs! We for sure do not want to repeat X11 mistakes, and I am certain that we can implement protocols which provide the required functionality in a way that is a nice compromise in allowing applications a path forward into the Wayland future, while also being as good as possible and improving upon X11. For example, the toplevel-icon proposal is already a lot better than anything X11 ever had. Relaxing ACK requirements for the ext namespace is also a good proposed administrative change, as it allows some compositors to add features they want to support to the shared repository easier, while also not mandating them for others. In my opinion, it would allow for a lot less friction between the two different ideas of how Wayland protocol development should work. Some compositors could move forward and support more protocol extensions, while more restrictive compositors could support less things. Applications can detect supported protocols at launch and change their behavior accordingly (ideally even abstracted by toolkits). You may now say that a lot of apps are ported, so surely this issue can not be that bad. And yes, what Wayland provides today may be enough for 80-90% of all apps. But what I hope the detour into the research lab has done is convince you that this smaller percentage of apps matters. A lot. And that it may be worthwhile to support them. To end on a positive note: When it came to porting concrete apps over to Wayland, the only real showstoppers so far⁵ were the missing window-positioning and window-position-restore features. I encountered them when porting my own software, and I got the issue as feedback from colleagues and fellow engineers. In second place was UI testing and automation support, the window-icon issue was mentioned twice, but being a cosmetic issue it likely simply hurts people less and they can ignore it easier. What this means is that the majority of apps are already fine, and many others are very, very close! A Wayland future for everyone is within our grasp! I will also bring my two protocol MRs to their conclusion for sure, because as application developers we need clarity on what the platform (either all desktops or even just a few) supports and will or will not support in future. And the only way to get something good done is by contribution and friendly discussion.

Footnotes
Apologies for the clickbait-y title it comes with the subject
When I talk about Wayland I mean the combined set of display server protocols and accepted protocol extensions, unless otherwise clarified.
I would have picked a picture from our lab, but that would have needed permission first
Qt has awesome platform issues pages, like for macOS and Linux/X11 which help with porting efforts, but Qt doesn t even list Linux/Wayland as supported platform. There is some information though, like window geometry peculiarities, which aren t particularly helpful when porting (but still essential to know).
Besides issues with Nvidia hardware CUDA for simulations and machine-learning is pretty much everywhere, so Nvidia cards are common, which causes trouble on Wayland still. It is improving though.

Having had success running Trisquel on the ppc64 Talos II, I felt ready to get an arm64 machine running Trisquel. I have a Ampere Altra Developer Platform from ADLINK, which is a fairly powerful desktop machine. While there were some issues during installation, I m happy to say the machine is stable and everything appears to work fine.

ISO images for non-amd64 platforms are unfortunately still hidden from the main Trisquel download area, so you will have to use the following procedure to download and extract a netinst ISO image (using debian-installer) and write it to a USB memory device. Another unfortunate problem is that there are no OpenPGP signatures or hash checksums, but below I publish one checksum.

wget -q http://builds.trisquel.org/debian-installer-images/debian-installer-images_20210731+deb11u9+11.0trisquel15_arm64.tar.gz
tar xfa debian-installer-images_20210731+deb11u9+11.0trisquel15_arm64.tar.gz ./installer-arm64/20210731+deb11u9+11/images/netboot/mini.iso
echo '311732519cc8c7c1bb2fe873f134fdafb211ef3bcb5b0d2ecdc6ea4e3b336357  installer-arm64/20210731+deb11u9+11/images/netboot/mini.iso'   sha256sum -c
sudo wipefs -a /dev/sdX
sudo dd if=installer-arm64/20210731+deb11u9+11/images/netboot/mini.iso of=/dev/sdX conv=sync status=progress

Insert the USB stick in a USB slot in the machine, and power up. Press ESCAPE at the BIOS prompt and select the USB device as the boot device. The first problem that hit me was that translations didn t work, I selected Swedish but the strings were garbled. Rebooting and selecting the default English worked fine. For installation, you need Internet connectivity and I use the RJ45 port closest to VGA/serial which is available as enP5p1s0 in the installer. I wouldn t connect the BMC RJ45 port to anything unless you understand the security implications. During installation you have to create a EFI partition for booting, and I ended up with one 1GB EFI partition, one 512GB ext4 partition for / with discard/noatime options, and a 32GB swap partition. The installer did not know about any Trisquel mirrors, but only had the default archive.trisquel.org, so if you need to use a mirror, take a note of the necessary details. The installation asks me about which kernel to install, and I went with the default linux-generic which results in a 5.15 linux-libre kernel. At the end of installation, unfortunately grub failed with a mysterious error message: Unable to install GRUB in dummy. Executing 'grub-install dummy' failed. On another console there is a better error message: failed to register the EFI boot entry. There are some references to file descriptor issues. Perhaps I partitioned the disk in a bad way, or this is a real bug in the installer for this platform. I continued installation, and it appears the installer was able to write GRUB to the device, but not add the right boot menu. So I was able to finish the installation properly, and then reboot and manually type the following GRUB commands: linux (hd0,gpt2)/boot/vmlinuz initrd (hd0,gpt2)/boot/initrd.img boot. Use the GRUB ls command to find the right device. See images below for more information.

Booting and installing GRUB again manually works fine:

root@ampel:~# update-grub
Sourcing file  /etc/default/grub'
Sourcing file  /etc/default/grub.d/background.cfg'
Sourcing file  /etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.0-91-generic
Found initrd image: /boot/initrd.img-5.15.0-91-generic
Found linux image: /boot/vmlinuz-5.15.0-58-generic
Found initrd image: /boot/initrd.img-5.15.0-58-generic
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
root@ampel:~#

During installation I tend to avoid selecting any tasksel components, in part because it didn t use a local mirror to gain network speed, and in part because I don t want to generate OpenSSH keys in a possibly outdated environment that is harder to audit and reproducible rebuild than the finally installed system. When I selected the OpenSSH and GNOME tasksel, I get an error, but fortunately using apt get directly is simple.

root@ampel:~# tasksel
Tasksel GNOME failed:
tasksel: apt-get failed (100)
root@ampel:~# apt-get install trisquel-gnome ssh

Graphics in GNOME was slow using the built-in ASPEED AST2500 VGA controller with linux-libre 5.15. There are kernels labeled 64k but I haven t tested them, and I m not sure they would bring any significant advantage. I simply upgraded to a more recent linux-libre 6.2 kernel via the linux-image-generic-hwe-11.0 virtual package. After a reboot, graphics in GNOME is usable.

root@ampel:~# apt-get install linux-image-generic-hwe-11.0

There seems to be some issue with power-saving inside GNOME, since the machine becomes unresponsive after 20 minutes, and I m unable to make it resume via keyboard or power button. Disabling the inactivity power setting in GNOME works fine to resolve this. I will now put this machine to some more heavy use and see how it handles it. I hope to find more suitable arm64-based servers to complement my ppc64el-based servers in the future, as this ADLINK Ampere Altra Developer Platform with liquid-cooling is more of a toy than a serious server for use in a datacentre. Happy Trisquel-on-arm64 Hacking!

I ended 2022 with a musical retrospective and very much enjoyed writing that blog post. As such, I have decided to do the same for 2023! From now on, this will probably be an annual thing :) Albums In 2023, I added 73 new albums to my collection nearly 2 albums every three weeks! I listed them below in the order in which I acquired them. I purchased most of these albums when I could and borrowed the rest at libraries. If you want to browse though, I added links to the album covers pointing either to websites where you can buy them or to Discogs when digital copies weren't available. Once again this year, it seems that Punk (mostly O !) and Metal dominate my list, mostly fueled by Angry Metal Guy and the amazing Montr al Skinhead/Punk concert scene.

Nansarunai - Ruins of the Moonlight Temple

Aurora - All My Demons Greeting Me As a Friend

High Command - Eclipse of the Dual Moons

FROZEN DAWN - The Decline of the Enlightened Gods

Komitern Sect - Des jours plus durs que d'autres

Coven - Witchcraft Destroys Minds and Reaps Souls

Los Tabascos - Sous lacrymos et les pav s

V A N I S H I N G K I D S - Miracle of Death

Fire in the Distance - Air Not Meant For Us

Concerts A trend I started in 2022 was to go to as many concerts of artists I like as possible. I'm happy to report I went to around 80% more concerts in 2023 than in 2022! Looking back at my list, April was quite a busy month... Here are the concerts I went to in 2023:

March 8th: Godspeed You! Black Emperor
April 11th: Alexandra Str liski
April 12th: Bikini Kill
April 21th: Brigada Flores Magon, Union Thugs
April 28th: Komintern Sect, The Outcasts, Violent Way, Ultra Razzia, Over the Hill
May 3rd: First Fragment
May 12th: Rhapsody of Fire, Wind Rose
May 13th: Aeternam
June 2nd: Mortier, La Gachette
June 17th: Ultra Razzia, Total Nada, BLEMISH
June 30th: Avishai Cohen Trio
July 9th: Richard Galliano
August 18th: Gojira, Mastodon, Lorna Shore
September 14th: Jinjer
September 22nd: CUIR, Salvaje Punk, Hysteric Polemix, Perestroika, Ultra Razzia, Ilusion, Over the Hill, Asbestos
October 6th: Rancoeur, Street Code, Tenaz, Mortimer, Guernica, High Anxiety

Although metalfinder continues to work as intended, I'm very glad to have discovered the Montr al underground scene has departed from Facebook/Instagram and adopted en masse Gancio, a FOSS community agenda that supports ActivityPub. Our local instance, askapunk.net is pretty much all I could ask for :) That's it for 2023!

This year was hard from a personal and work point of view, which impacted the amount of Free Software bits I ended up doing - even when I had the time I often wasn t in the right head space to make progress on things. However writing this annual recap up has been a useful exercise, as I achieved more than I realised. For previous years see 2019, 2020, 2021 + 2022.

Conferences The only Free Software related conference I made it to this year was DebConf23 in Kochi, India. Changes with projects at work meant I couldn t justify anything work related. This year I m planning to make it to FOSDEM, and haven t made a decision on DebConf24 yet.

Debian Most of my contributions to Free software continue to happen within Debian. I started the year working on retrogaming with Kodi on Debian. I got this to a much better state for bookworm, with it being possible to run the bsnes-mercury emulator under Kodi using RetroArch. There are a few other libretro backends available for RetroArch, but Kodi needs some extra controller mappings packaged up first. Plenty of uploads were involved, though some of this was aligning all the dependencies and generally cleaning things up in iterations.

retroarch: 1.13.0-dfsg+3, 1.14.0+dfsg-1, 1.15.0+dfsg-1, 1.16.0.3+dfgs-1

libretro-core-info uploads: 1.14.0-1, 1.15.0-1.

libretro-bsnes-mercury: 094+git20220807-2, 094+git20220807-3, 094+git20220807-4, 094+git20220807-5, 094+git20220807-6, 094+git20220807-7, 094+git20220807-8.

kodi-game-libretro: 20.2.0-2, 20.2.2-1, 20.2.2-2, 20.2.4-1, 20.2.4-2

rcheevos: 10.5.0-2, 10.6.0-1, 10.7.0-1, 10.7.1-1.

I continued to work on a few packages within the Debian Electronics Packaging Team. OpenOCD produced a new release in time for the bookworm release, so I uploaded 0.12.0-1. There were a few minor sigrok cleanups - sigrok 0.3, libsigrokdecode 0.5.3-4 + libsigrok 0.5.2-4 / 0.5.2-5. While I didn t manage to get the work completed I did some renaming of the ESP8266 related packages - gcc-xtensa-lx106 (which saw a 13 upload pre-bookworm) has become gcc-xtensa (with 14) and binutils-xtensa-lx106 has become binutils-xtensa (with 6). Binary packages remain the same, but this is intended to allow for the generation of ESP32 compiler toolchains from the same source. onak saw 0.6.3-1 uploaded to match the upstream release. I also uploaded libgpg-error 1.47-1 (though I can claim no credit for any of the work in preparing the package) to help move things forward on updating gnupg2 in Debian. I NMUed tpm2-pkcs11 1.9.0-0.1 to fix some minor issues pre-bookworm release; I use this package myself to store my SSH key within my laptop TPM, so I care about it being in a decent state. sg3-utils also saw a bit of love with 1.46-2 + 1.46-3 - I don t work in the storage space these days, but I m still listed as an uploaded and there was an RC bug around the library package naming that I was qualified to fix and test pre-bookworm. Related to my retroarch work I sponsored uploads of mgba for Ryan Tandy: 0.10.0+dfsg-1, 0.10.0+dfsg-2, 0.10.1+dfsg-1, 0.10.2+dfsg-1, mgba 0.10.1+dfsg-1+deb12u1. As part of the Data Protection Team I responded to various inbound queries to that team, both from project members and those external to the project. I continue to keep an eye on Debian New Members, even though I m mostly inactive as an application manager - we generally seem to have enough available recently. Mostly my involvement is via Front Desk activities, helping out with queries to the team alias, and contributing to internal discussions as well as our panel at DebConf23. Finally the 3 month rotation for Debian Keyring continues to operate smoothly. I dealt with 2023.03.24, 2023.06.26, 2023.06.29, 2023.09.10, 2023.09.24 + 2023.12.24.

Linux I had a few minor patches accepted to the kernel this year. A pair of safexcel cleanups (improved error logging for firmware load fail and cleanup on load failure) came out of upgrading the kernel running on my RB5009. The rest were related to my work on repurposing my C.H.I.P.. The AXP209 driver needed extended to support GPIO3 (with associated DT schema update). That allowed Bluetooth to be enabled. Adding the AXP209 internal temperature ADC as an iio-hwmon node means it can be tracked using the normal sensor monitoring framework. And finally I added the pinmux settings for mmc2, which I use to support an external microSD slot on my C.H.I.P.

Personal projects 2023 saw another minor release of onak, 0.6.3, which resulted in a corresponding Debian upload (0.6.3-1). It has a couple of bug fixes (including a particularly annoying, if minor, one around systemd socket activation that felt very satisfying to get to the bottom of), but I still lack the time to do any of the major changes I would like to. I wrote listadmin3 to allow easy manipulation of moderation queues for Mailman3. It s basic, but it s drastically improved my timeliness on dealing with held messages.

I know that people rave about GMail's spam filtering, but it didn't work for me: I was seeing too many false positives. I personally prefer to see some false negatives (i.e. letting some spam through), but to reduce false positives as much as possible (and ideally have a way to tune this). Here's the local SpamAssassin setup I have put together over many years. In addition to the parts I describe here, I also turn off greylisting on my email provider (KolabNow) because I don't want to have to wait for up to 10 minutes for a "2FA" email to go through. This setup assumes that you download all of your emails to your local machine. I use fetchmail for this, though similar tools should work too.

Three tiers of emails The main reason my setup works for me, despite my receiving hundreds of spam messages every day, is that I split incoming emails into three tiers via procmail:

not spam: delivered to inbox
likely spam: quarantined in a soft_spam/ folder
definitely spam: silently deleted

I only ever have to review the likely spam tier for false positives, which is on the order of 10-30 spam emails a day. I never even see the the hundreds that are silently deleted due to a very high score. This is implemented based on a threshold in my .procmailrc:

# Use spamassassin to check for spam
:0fw: .spamassassin.lock
  /usr/bin/spamassassin
# Throw away messages with a score of > 12.0
:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*
/dev/null
:0:
* ^X-Spam-Status: Yes
$HOME/Mail/soft_spam/
# Deliver all other messages
:0:
$ DEFAULT

I also use the following ~/.muttrc configuration to easily report false negatives/positives and examine my likely spam folder via a shortcut in mutt:

unignore X-Spam-Level
unignore X-Spam-Status
macro index S "c=soft_spam/\n" "Switch to soft_spam"
# Tell mutt about SpamAssassin headers so that I can sort by spam score
spam "X-Spam-Status: (Yes No), (hits score)=(-?[0-9]+\.[0-9])" "%3"
folder-hook =soft_spam 'push ol'
folder-hook =spam 'push ou'
# <Esc>d = de-register as non-spam, register as spam, move to spam folder.
macro index \ed "<enter-command>unset wait_key\n<pipe-entry>spamassassin -r\n<enter-command>set wait_key\n<save-message>=spam\n" "report the message as spam"
# <Esc>u = unregister as spam, register as non-spam, move to inbox folder.
macro index \eu "<enter-command>unset wait_key\n<pipe-entry>spamassassin -k\n<enter-command>set wait_key\n<save-message>=inbox\n" "correct the false positive (this is not spam)"

Custom SpamAssassin rules In addition to the default ruleset that comes with SpamAssassin, I've also accrued a number of custom rules over the years. The first set comes from the (now defunct) SpamAssassin Rules Emporium. The second set is the one that backs bugs.debian.org and lists.debian.org. Note this second one includes archived copies of some of the SARE rules and so I only use some of the rules in the common/ directory. Finally, I wrote a few custom rules of my own based on specific kinds of emails I have seen slip through the cracks. I haven't written any of those in a long time and I suspect some of my rules are now obsolete. You may want to do your own testing before you copy these outright. In addition to rules to match more spam, I've also written a ruleset to remove false positives in French emails coming from many of the above custom rules. I also wrote a rule to get a bonus to any email that comes with a patch:

describe FM_PATCH   Includes a patch
body FM_PATCH   /\bdiff -pruN\b/
score FM_PATCH  -1.0

since it's not very common in spam emails

SpamAssassin settings When it comes to my system-wide SpamAssassin configuration in /etc/spamassassin/, I enable the following plugins:

loadplugin Mail::SpamAssassin::Plugin::AntiVirus
loadplugin Mail::SpamAssassin::Plugin::AskDNS
loadplugin Mail::SpamAssassin::Plugin::ASN
loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
loadplugin Mail::SpamAssassin::Plugin::Bayes
loadplugin Mail::SpamAssassin::Plugin::BodyEval
loadplugin Mail::SpamAssassin::Plugin::Check
loadplugin Mail::SpamAssassin::Plugin::DKIM
loadplugin Mail::SpamAssassin::Plugin::DNSEval
loadplugin Mail::SpamAssassin::Plugin::FreeMail
loadplugin Mail::SpamAssassin::Plugin::FromNameSpoof
loadplugin Mail::SpamAssassin::Plugin::HashBL
loadplugin Mail::SpamAssassin::Plugin::HeaderEval
loadplugin Mail::SpamAssassin::Plugin::HTMLEval
loadplugin Mail::SpamAssassin::Plugin::HTTPSMismatch
loadplugin Mail::SpamAssassin::Plugin::ImageInfo
loadplugin Mail::SpamAssassin::Plugin::MIMEEval
loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
loadplugin Mail::SpamAssassin::Plugin::OLEVBMacro
loadplugin Mail::SpamAssassin::Plugin::PDFInfo
loadplugin Mail::SpamAssassin::Plugin::Phishing
loadplugin Mail::SpamAssassin::Plugin::Pyzor
loadplugin Mail::SpamAssassin::Plugin::Razor2
loadplugin Mail::SpamAssassin::Plugin::RelayEval
loadplugin Mail::SpamAssassin::Plugin::ReplaceTags
loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody
loadplugin Mail::SpamAssassin::Plugin::SpamCop
loadplugin Mail::SpamAssassin::Plugin::TextCat
loadplugin Mail::SpamAssassin::Plugin::TxRep
loadplugin Mail::SpamAssassin::Plugin::URIDetail
loadplugin Mail::SpamAssassin::Plugin::URIEval
loadplugin Mail::SpamAssassin::Plugin::VBounce
loadplugin Mail::SpamAssassin::Plugin::WelcomeListSubject
loadplugin Mail::SpamAssassin::Plugin::WLBLEval

Some of these require extra helper packages or Perl libraries to be installed. See the comments in the relevant *.pre files. My ~/.spamassassin/user_prefs file contains the following configuration:

required_hits   5
ok_locales en fr
# Bayes options
score BAYES_00 -4.0
score BAYES_40 -0.5
score BAYES_60 1.0
score BAYES_80 2.7
score BAYES_95 4.0
score BAYES_99 6.0
bayes_auto_learn 1
bayes_ignore_header X-Miltered
bayes_ignore_header X-MIME-Autoconverted
bayes_ignore_header X-Evolution
bayes_ignore_header X-Virus-Scanned
bayes_ignore_header X-Forwarded-For
bayes_ignore_header X-Forwarded-By
bayes_ignore_header X-Scanned-By
bayes_ignore_header X-Spam-Level
bayes_ignore_header X-Spam-Status

as well as manual score reductions due to false positives, and manual score increases to help push certain types of spam emails over the 12.0 definitely spam threshold. Finally, I have the FuzzyOCR package installed since it has occasionally flagged some spam that other tools had missed. It is a little resource intensive though and so you may want to avoid this one if you are filtering spam for other people. As always, feel free to leave a comment if you do something else that works well and that's not included in my setup. This is a work-in-progress.

For a while now I wanted to have direct access to the Signal database of messages and channels of my Desktop edition of Signal. I prefer the enforced end to end encryption of Signal these days for my communication with friends and family, to increase the level of safety and privacy as well as raising the cost of the mass surveillance government and non-government entities practice these days. In August I came across a nice recipe on how to use sqlcipher to extract statistics from the Signal database explaining how to do this. Unfortunately this did not work with the version of sqlcipher in Debian. The sqlcipher package is a "fork" of the sqlite package with added support for encrypted databases. Sadly the current Debian maintainer announced more than three years ago that he did not have time to maintain sqlcipher, so it seemed unlikely to be upgraded by the maintainer. I was reluctant to take on the job myself, as I have very limited experience maintaining shared libraries in Debian. After waiting and hoping for a few months, I gave up the last week, and set out to update the package. In the process I orphaned it to make it more obvious for the next person looking at it that the package need proper maintenance. The version in Debian was around five years old, and quite a lot of changes had taken place upstream into the Debian maintenance git repository. After spending a few days importing the new upstream versions, realising that upstream did not care much for SONAME versioning as I saw library symbols being both added and removed with minor version number changes to the project, I concluded that I had to do a SONAME bump of the library package to avoid surprising the reverse dependencies. I even added a simple autopkgtest script to ensure the package work as intended. Dug deep into the hole of learning shared library maintenance, I set out a few days ago to upload the new version to Debian experimental to see what the quality assurance framework in Debian had to say about the result. The feedback told me the pacakge was not too shabby, and yesterday I uploaded the latest version to Debian unstable. It should enter testing today or tomorrow, perhaps delayed by a small library transition. Armed with a new version of sqlcipher, I can now have a look at the SQL database in ~/.config/Signal/sql/db.sqlite. First, one need to fetch the encryption key from the Signal configuration using this simple JSON extraction command:

/usr/bin/jq -r '."key"' ~/.config/Signal/config.json

Assuming the result from that command is 'secretkey', which is a hexadecimal number representing the key used to encrypt the database. Next, one can now connect to the database and inject the encryption key for access via SQL to fetch information from the database. Here is an example dumping the database structure:

% sqlcipher ~/.config/Signal/sql/db.sqlite
sqlite> PRAGMA key = "x'secretkey'";
sqlite> .schema
CREATE TABLE sqlite_stat1(tbl,idx,stat);
CREATE TABLE conversations(
      id STRING PRIMARY KEY ASC,
      json TEXT,
      active_at INTEGER,
      type STRING,
      members TEXT,
      name TEXT,
      profileName TEXT
    , profileFamilyName TEXT, profileFullName TEXT, e164 TEXT, serviceId TEXT, groupId TEXT, profileLastFetchedAt INTEGER);
CREATE TABLE identityKeys(
      id STRING PRIMARY KEY ASC,
      json TEXT
    );
CREATE TABLE items(
      id STRING PRIMARY KEY ASC,
      json TEXT
    );
CREATE TABLE sessions(
      id TEXT PRIMARY KEY,
      conversationId TEXT,
      json TEXT
    , ourServiceId STRING, serviceId STRING);
CREATE TABLE attachment_downloads(
    id STRING primary key,
    timestamp INTEGER,
    pending INTEGER,
    json TEXT
  );
CREATE TABLE sticker_packs(
    id TEXT PRIMARY KEY,
    key TEXT NOT NULL,
    author STRING,
    coverStickerId INTEGER,
    createdAt INTEGER,
    downloadAttempts INTEGER,
    installedAt INTEGER,
    lastUsed INTEGER,
    status STRING,
    stickerCount INTEGER,
    title STRING
  , attemptedStatus STRING, position INTEGER DEFAULT 0 NOT NULL, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync
      INTEGER DEFAULT 0 NOT NULL);
CREATE TABLE stickers(
    id INTEGER NOT NULL,
    packId TEXT NOT NULL,
    emoji STRING,
    height INTEGER,
    isCoverOnly INTEGER,
    lastUsed INTEGER,
    path STRING,
    width INTEGER,
    PRIMARY KEY (id, packId),
    CONSTRAINT stickers_fk
      FOREIGN KEY (packId)
      REFERENCES sticker_packs(id)
      ON DELETE CASCADE
  );
CREATE TABLE sticker_references(
    messageId STRING,
    packId TEXT,
    CONSTRAINT sticker_references_fk
      FOREIGN KEY(packId)
      REFERENCES sticker_packs(id)
      ON DELETE CASCADE
  );
CREATE TABLE emojis(
    shortName TEXT PRIMARY KEY,
    lastUsage INTEGER
  );
CREATE TABLE messages(
        rowid INTEGER PRIMARY KEY ASC,
        id STRING UNIQUE,
        json TEXT,
        readStatus INTEGER,
        expires_at INTEGER,
        sent_at INTEGER,
        schemaVersion INTEGER,
        conversationId STRING,
        received_at INTEGER,
        source STRING,
        hasAttachments INTEGER,
        hasFileAttachments INTEGER,
        hasVisualMediaAttachments INTEGER,
        expireTimer INTEGER,
        expirationStartTimestamp INTEGER,
        type STRING,
        body TEXT,
        messageTimer INTEGER,
        messageTimerStart INTEGER,
        messageTimerExpiresAt INTEGER,
        isErased INTEGER,
        isViewOnce INTEGER,
        sourceServiceId TEXT, serverGuid STRING NULL, sourceDevice INTEGER, storyId STRING, isStory INTEGER
        GENERATED ALWAYS AS (type IS 'story'), isChangeCreatedByUs INTEGER NOT NULL DEFAULT 0, isTimerChangeFromSync INTEGER
        GENERATED ALWAYS AS (
          json_extract(json, '$.expirationTimerUpdate.fromSync') IS 1
        ), seenStatus NUMBER default 0, storyDistributionListId STRING, expiresAt INT
        GENERATED ALWAYS
        AS (ifnull(
          expirationStartTimestamp + (expireTimer * 1000),
          9007199254740991
        )), shouldAffectActivity INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), shouldAffectPreview INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), isUserInitiatedMessage INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'group-v2-change',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), mentionsMe INTEGER NOT NULL DEFAULT 0, isGroupLeaveEvent INTEGER
        GENERATED ALWAYS AS (
          type IS 'group-v2-change' AND
          json_array_length(json_extract(json, '$.groupV2Change.details')) IS 1 AND
          json_extract(json, '$.groupV2Change.details[0].type') IS 'member-remove' AND
          json_extract(json, '$.groupV2Change.from') IS NOT NULL AND
          json_extract(json, '$.groupV2Change.from') IS json_extract(json, '$.groupV2Change.details[0].aci')
        ), isGroupLeaveEventFromOther INTEGER
        GENERATED ALWAYS AS (
          isGroupLeaveEvent IS 1
          AND
          isChangeCreatedByUs IS 0
        ), callId TEXT
        GENERATED ALWAYS AS (
          json_extract(json, '$.callId')
        ));
CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample);
CREATE TABLE jobs(
        id TEXT PRIMARY KEY,
        queueType TEXT STRING NOT NULL,
        timestamp INTEGER NOT NULL,
        data STRING TEXT
      );
CREATE TABLE reactions(
        conversationId STRING,
        emoji STRING,
        fromId STRING,
        messageReceivedAt INTEGER,
        targetAuthorAci STRING,
        targetTimestamp INTEGER,
        unread INTEGER
      , messageId STRING);
CREATE TABLE senderKeys(
        id TEXT PRIMARY KEY NOT NULL,
        senderId TEXT NOT NULL,
        distributionId TEXT NOT NULL,
        data BLOB NOT NULL,
        lastUpdatedDate NUMBER NOT NULL
      );
CREATE TABLE unprocessed(
        id STRING PRIMARY KEY ASC,
        timestamp INTEGER,
        version INTEGER,
        attempts INTEGER,
        envelope TEXT,
        decrypted TEXT,
        source TEXT,
        serverTimestamp INTEGER,
        sourceServiceId STRING
      , serverGuid STRING NULL, sourceDevice INTEGER, receivedAtCounter INTEGER, urgent INTEGER, story INTEGER);
CREATE TABLE sendLogPayloads(
        id INTEGER PRIMARY KEY ASC,
        timestamp INTEGER NOT NULL,
        contentHint INTEGER NOT NULL,
        proto BLOB NOT NULL
      , urgent INTEGER, hasPniSignatureMessage INTEGER DEFAULT 0 NOT NULL);
CREATE TABLE sendLogRecipients(
        payloadId INTEGER NOT NULL,
        recipientServiceId STRING NOT NULL,
        deviceId INTEGER NOT NULL,
        PRIMARY KEY (payloadId, recipientServiceId, deviceId),
        CONSTRAINT sendLogRecipientsForeignKey
          FOREIGN KEY (payloadId)
          REFERENCES sendLogPayloads(id)
          ON DELETE CASCADE
      );
CREATE TABLE sendLogMessageIds(
        payloadId INTEGER NOT NULL,
        messageId STRING NOT NULL,
        PRIMARY KEY (payloadId, messageId),
        CONSTRAINT sendLogMessageIdsForeignKey
          FOREIGN KEY (payloadId)
          REFERENCES sendLogPayloads(id)
          ON DELETE CASCADE
      );
CREATE TABLE preKeys(
        id STRING PRIMARY KEY ASC,
        json TEXT
      , ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE signedPreKeys(
        id STRING PRIMARY KEY ASC,
        json TEXT
      , ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE badges(
        id TEXT PRIMARY KEY,
        category TEXT NOT NULL,
        name TEXT NOT NULL,
        descriptionTemplate TEXT NOT NULL
      );
CREATE TABLE badgeImageFiles(
        badgeId TEXT REFERENCES badges(id)
          ON DELETE CASCADE
          ON UPDATE CASCADE,
        'order' INTEGER NOT NULL,
        url TEXT NOT NULL,
        localPath TEXT,
        theme TEXT NOT NULL
      );
CREATE TABLE storyReads (
        authorId STRING NOT NULL,
        conversationId STRING NOT NULL,
        storyId STRING NOT NULL,
        storyReadDate NUMBER NOT NULL,
        PRIMARY KEY (authorId, storyId)
      );
CREATE TABLE storyDistributions(
        id STRING PRIMARY KEY NOT NULL,
        name TEXT,
        senderKeyInfoJson STRING
      , deletedAtTimestamp INTEGER, allowsReplies INTEGER, isBlockList INTEGER, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync INTEGER);
CREATE TABLE storyDistributionMembers(
        listId STRING NOT NULL REFERENCES storyDistributions(id)
          ON DELETE CASCADE
          ON UPDATE CASCADE,
        serviceId STRING NOT NULL,
        PRIMARY KEY (listId, serviceId)
      );
CREATE TABLE uninstalled_sticker_packs (
        id STRING NOT NULL PRIMARY KEY,
        uninstalledAt NUMBER NOT NULL,
        storageID STRING,
        storageVersion NUMBER,
        storageUnknownFields BLOB,
        storageNeedsSync INTEGER NOT NULL
      );
CREATE TABLE groupCallRingCancellations(
        ringId INTEGER PRIMARY KEY,
        createdAt INTEGER NOT NULL
      );
CREATE TABLE IF NOT EXISTS 'messages_fts_data'(id INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'messages_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'messages_fts_content'(id INTEGER PRIMARY KEY, c0);
CREATE TABLE IF NOT EXISTS 'messages_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'messages_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID;
CREATE TABLE edited_messages(
        messageId STRING REFERENCES messages(id)
          ON DELETE CASCADE,
        sentAt INTEGER,
        readStatus INTEGER
      , conversationId STRING);
CREATE TABLE mentions (
        messageId REFERENCES messages(id) ON DELETE CASCADE,
        mentionAci STRING,
        start INTEGER,
        length INTEGER
      );
CREATE TABLE kyberPreKeys(
        id STRING PRIMARY KEY NOT NULL,
        json TEXT NOT NULL, ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE callsHistory (
        callId TEXT PRIMARY KEY,
        peerId TEXT NOT NULL, -- conversation id (legacy)   uuid   groupId   roomId
        ringerId TEXT DEFAULT NULL, -- ringer uuid
        mode TEXT NOT NULL, -- enum "Direct"   "Group"
        type TEXT NOT NULL, -- enum "Audio"   "Video"   "Group"
        direction TEXT NOT NULL, -- enum "Incoming"   "Outgoing
        -- Direct: enum "Pending"   "Missed"   "Accepted"   "Deleted"
        -- Group: enum "GenericGroupCall"   "OutgoingRing"   "Ringing"   "Joined"   "Missed"   "Declined"   "Accepted"   "Deleted"
        status TEXT NOT NULL,
        timestamp INTEGER NOT NULL,
        UNIQUE (callId, peerId) ON CONFLICT FAIL
      );
[ dropped all indexes to save space in this blog post ]
CREATE TRIGGER messages_on_view_once_update AFTER UPDATE ON messages
      WHEN
        new.body IS NOT NULL AND new.isViewOnce = 1
      BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
      END;
CREATE TRIGGER messages_on_insert AFTER INSERT ON messages
      WHEN new.isViewOnce IS NOT 1 AND new.storyId IS NULL
      BEGIN
        INSERT INTO messages_fts
          (rowid, body)
        VALUES
          (new.rowid, new.body);
      END;
CREATE TRIGGER messages_on_delete AFTER DELETE ON messages BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
        DELETE FROM sendLogPayloads WHERE id IN (
          SELECT payloadId FROM sendLogMessageIds
          WHERE messageId = old.id
        );
        DELETE FROM reactions WHERE rowid IN (
          SELECT rowid FROM reactions
          WHERE messageId = old.id
        );
        DELETE FROM storyReads WHERE storyId = old.storyId;
      END;
CREATE VIRTUAL TABLE messages_fts USING fts5(
        body,
        tokenize = 'signal_tokenizer'
      );
CREATE TRIGGER messages_on_update AFTER UPDATE ON messages
      WHEN
        (new.body IS NULL OR old.body IS NOT new.body) AND
         new.isViewOnce IS NOT 1 AND new.storyId IS NULL
      BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
        INSERT INTO messages_fts
          (rowid, body)
        VALUES
          (new.rowid, new.body);
      END;
CREATE TRIGGER messages_on_insert_insert_mentions AFTER INSERT ON messages
      BEGIN
        INSERT INTO mentions (messageId, mentionAci, start, length)
        
    SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
      bodyRanges.value ->> 'start' as start,
      bodyRanges.value ->> 'length' as length
    FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
    WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
  
        AND messages.id = new.id;
      END;
CREATE TRIGGER messages_on_update_update_mentions AFTER UPDATE ON messages
      BEGIN
        DELETE FROM mentions WHERE messageId = new.id;
        INSERT INTO mentions (messageId, mentionAci, start, length)
        
    SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
      bodyRanges.value ->> 'start' as start,
      bodyRanges.value ->> 'length' as length
    FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
    WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
  
        AND messages.id = new.id;
      END;
sqlite>

Finally I have the tool needed to inspect and process Signal messages that I need, without using the vendor provided client. Now on to transforming it to a more useful format. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Recently I completed a small project, including an embedded microcontroller. For me, using the popular Arduino IDE, and C, was a mistake. The experience with Rust was better, but still very exciting, and not in a good way. Here follows the rant.

Introduction In a recent project (I ll write about the purpose, and the hardware in another post) I chose to use a DigiSpark board. This is a small board with a USB-A tongue (but not a proper plug), and an ATTiny85 microcontroller, This chip has 8 pins and is quite small really, but it was plenty for my application. By choosing something popular, I hoped for convenient hardware, and an uncomplicated experience. Convenient hardware, I got. Arduino IDE The usual way to program these boards is via an IDE. I thought I d go with the flow and try that. I knew these were closely related to actual Arduinos and saw that the IDE package arduino was in Debian. But it turns out that the Debian package s version doesn t support the DigiSpark. (AFAICT from the list it offered me, I m not sure it supports any ATTiny85 board.) Also, disturbingly, its board manager seemed to be offering to install board support, suggesting it would download stuff from the internet and run it. That wouldn t be acceptable for my main laptop. I didn t expect to be doing much programming or debugging, and the project didn t have significant security requirements: the chip, in my circuit, has only a very narrow ability do anything to the real world, and no network connection of any kind. So I thought it would be tolerable to do the project on my low-security video laptop . That s the machine where I m prepared to say yes to installing random software off the internet. So I went to the upstream Arduino site and downloaded a tarball containing the Arduino IDE. After unpacking that in /opt it ran and produced a pointy-clicky IDE, as expected. I had already found a 3rd-party tutorial saying I needed to add a magic URL (from the DigiSpark s vendor) in the preferences. That indeed allowed it to download a whole pile of stuff. Compilers, bootloader clients, god knows what. However, my tiny test program didn t make it to the board. Half-buried in a too-small window was an error message about the board s bootloader ( Micronucleus ) being too new. The boards I had came pre-flashed with micronucleus 2.2. Which is hardly new, But even so the official Arduino IDE (or maybe the DigiSpark s board package?) still contains an old version. So now we have all the downsides of curl bash-ware, but we re lacking the it s up to date and it just works upsides. Further digging found some random forum posts which suggested simply downloading a newer micronucleus and manually stuffing it into the right place: one overwrites a specific file, in the middle the heaps of stuff that the Arduino IDE s board support downloader squirrels away in your home directory. (In my case, the home directory of the untrusted shared user on the video laptop,) So, whatever . I did that. And it worked! Having demo d my ability to run code on the board, I set about writing my program. Writing C again The programming language offered via the Arduino IDE is C. It s been a little while since I started a new thing in C. After having spent so much of the last several years writing Rust. C s primitiveness quickly started to grate, and the program couldn t easily be as DRY as I wanted (Don t Repeat Yourself, see Wilson et al, 2012, 4, p.6). But, I carried on; after all, this was going to be quite a small job. Soon enough I had a program that looked right and compiled. Before testing it in circuit, I wanted to do some QA. So I wrote a simulator harness that #included my Arduino source file, and provided imitations of the few Arduino library calls my program used. As an side advantage, I could build and run the simulation on my main machine, in my normal development environment (Emacs, make, etc.). The simulator runs confirmed the correct behaviour. (Perhaps there would have been some more faithful simulation tool, but the Arduino IDE didn t seem to offer it, and I wasn t inclined to go further down that kind of path.) So I got the video laptop out, and used the Arduino IDE to flash the program. It didn t run properly. It hung almost immediately. Some very ad-hoc debugging via led-blinking (like printf debugging, only much worse) convinced me that my problem was as follows: Arduino C has 16-bit ints. My test harness was on my 64-bit Linux machine. C was autoconverting things (when building for the micrcocontroller). The way the Arduino IDE ran the compiler didn t pass the warning options necessary to spot narrowing implicit conversions. Those warnings aren t the default in C in general ~~because C compilers hate us all~~ for compatibility reasons. I don t know why those warnings are not the default in the Arduino IDE, but my guess is that they didn t want to bother poor novice programmers with messages from the compiler explaining how their program is quite possibly wrong. After all, users don t like error messages so we shouldn t report errors. And novice programmers are especially fazed by error messages so it s better to just let them struggle themselves with the arcane mysteries of undefined behaviour in C? The Arduino IDE does offer a dropdown for compiler warnings . The default is None. Setting it to All didn t produce anything about my integer overflow bugs. And, the output was very hard to find anyway because the log window has a constant stream of strange messages from javax.jmdns, with hex DNS packet dumps. WTF. Other things that were vexing about the Arduino IDE: it has fairly fixed notions (which don t seem to be documented) about how your files and directories ought to be laid out, and magical machinery for finding things you put nearby its sketch (as it calls them) and sticking them in its ear, causing lossage. It has a tendency to become confused if you edit files under its feet (e.g. with git checkout). It wasn t really very suited to a workflow where principal development occurs elsewhere. And, important settings such as the project s clock speed, or even the target board, or the compiler warning settings to use weren t stored in the project directory along with the actual code. I didn t look too hard, but I presume they must be in a dotfile somewhere. This is madness. Apparently there is an Arduino CLI too. But I was already quite exasperated, and I didn t like the idea of going so far off the beaten path, when the whole point of using all this was to stay with popular tooling and share fate with others. (How do these others cope? I have no idea.) As for the integer overflow bug: I didn t seriously consider trying to figure out how to control in detail the C compiler options passed by the Arduino IDE. (Perhaps this is possible, but not really documented?) I did consider trying to run a cross-compiler myself from the command line, with appropriate warning options, but that would have involved providing (or stubbing, again) the Arduino/DigiSpark libraries (and bugs could easily lurk at that interface). Instead, I thought, if only I had written the thing in Rust . But that wasn t possible, was it? Does Rust even support this board? Rust on the DigiSpark I did a cursory web search and found a very useful blog post by Dylan Garrett. This encouraged me to think it might be a workable strategy. I looked at the instructions there. It seemed like I could run them via the privsep arrangement I use to protect myself when developing using upstream cargo packages from crates.io. I got surprisingly far surprisingly quickly. It did, rather startlingly, cause my rustup to download a random recent Nightly Rust, but I have six of those already for other Reasons. Very quickly I got the trinket LED blink example, referenced by Dylan s blog post, to compile. Manually copying the file to the video laptop allowed me to run the previously-downloaded micronucleus executable and successfully run the blink example on my board! I thought a more principled approach to the bootloader client might allow a more convenient workflow. I found the upstream Micronucleus git releases and tags, and had a look over its source code, release dates, etc. It seemed plausible, so I compiled v2.6 from source. That was a success: now I could build and install a Rust program onto my board, from the command line, on my main machine. No more pratting about with the video laptop. I had got further, more quickly, with Rust, than with the Arduino IDE, and the outcome and workflow was superior. So, basking in my success, I copied the directory containing the example into my own project, renamed it, and adjusted the path references. That didn t work. Now it didn t build. Even after I copied about .cargo/config.toml and rust-toolchain.toml it didn t build, producing a variety of exciting messages, depending what precisely I tried. I don t have detailed logs of my flailing: the instructions say to build it by cd ing to the subdirectory, and, given that what I was trying to do was to not follow those instructions, it didn t seem sensible to try to prepare a proper repro so I could file a ticket. I wasn t optimistic about investigating it more deeply myself: I have some experience of fighting cargo, and it s not usually fun. Looking at some of the build control files, things seemed quite complicated. Additionally, not all of the crates are on crates.io. I have no idea why not. So, I would need to supply local copies of them anyway. I decided to just git subtree add the avr-hal git tree. (That seemed better than the approach taken by the avr-hal project s cargo template, since that template involve a cargo dependency on a foreign git repository. Perhaps it would be possible to turn them into path dependencies, but given that I had evidence of file-location-sensitive behaviour, which I didn t feel like I wanted to spend time investigating, using that seems like it would possibly have invited more trouble. Also, I don t like package templates very much. They re a form of clone-and-hack: you end up stuck with whatever bugs or oddities exist in the version of the template which was current when you started.) Since I couldn t get things to build outside avr-hal, I edited the example, within avr-hal, to refer to my (one) program.rs file outside avr-hal, with a #[path] instruction. That s not pretty, but it worked. I also had to write a nasty shell script to work around the lack of good support in my nailing-cargo privsep tool for builds where cargo must be invoked in a deep subdirectory, and/or Cargo.lock isn t where it expects, and/or the target directory containing build products is in a weird place. It also has to filter the output from cargo to adjust the pathnames in the error messages. Otherwise, running both cd A; cargo build and cd B; cargo build from a Makefile produces confusing sets of error messages, some of which contain filenames relative to A and some relative to B, making it impossible for my Emacs to reliably find the right file. RIIR (Rewrite It In Rust) Having got my build tooling sorted out I could go back to my actual program. I translated the main program, and the simulator, from C to Rust, more or less line-by-line. I made the Rust version of the simulator produce the same output format as the C one. That let me check that the two programs had the same (simulated) behaviour. Which they did (after fixing a few glitches in the simulator log formatting). Emboldened, I flashed the Rust version of my program to the DigiSpark. It worked right away! RIIR had caused the bug to vanish. Of course, to rewrite the program in Rust, and get it to compile, it was necessary to be careful about the types of all the various integers, so that s not so surprising. Indeed, it was the point. I was then able to refactor the program to be a bit more natural and DRY, and improve some internal interfaces. Rust s greater power, compared to C, made those cleanups easier, so making them worthwhile. However, when doing real-world testing I found a weird problem: my timings were off. Measured, the real program was too fast by a factor of slightly more than 2. A bit of searching (and searching my memory) revealed the cause: I was using a board template for an Adafruit Trinket. The Trinket has a clock speed of 8MHz. But the DigiSpark runs at 16.5MHz. (This is discussed in a ticket against one of the C/C++ libraries supporting the ATTiny85 chip.) The Arduino IDE had offered me a choice of clock speeds. I have no idea how that dropdown menu took effect; I suspect it was adding prelude code to adjust the clock prescaler. But my attempts to mess with the CPU clock prescaler register by hand at the start of my Rust program didn t bear fruit. So instead, I adopted a bodge: since my code has (for code structure reasons, amongst others) only one place where it dealt with the underlying hardware s notion of time, I simply changed my delay function to adjust the passed-in delay values, compensating for the wrong clock speed. There was probably a more principled way. For example I could have (re)based my work on either of the two unmerged open MRs which added proper support for the DigiSpark board, rather than abusing the Adafruit Trinket definition. But, having a nearly-working setup, and an explanation for the behaviour, I preferred the narrower fix to reopening any cans of worms. An offer of help As will be obvious from this posting, I m not an expert in dev tools for embedded systems. Far from it. This area seems like quite a deep swamp, and I m probably not the person to help drain it. (Frankly, much of the improvement work ought to be done, and paid for, by hardware vendors.) But, as a full Member of the Debian Project, I have considerable gatekeeping authority there. I also have much experience of software packaging, build systems, and release management. If anyone wants to try to improve the situation with embedded tooling in Debian, and is willing to do the actual packaging work. I would be happy to advise, and to review and sponsor your contributions. An obvious candidate: it seems to me that micronucleus could easily be in Debian. Possibly a DigiSpark board definition could be provided to go with the arduino package. Unfortunately, IMO Debian s Rust packaging tooling and workflows are very poor, and the first of my suggestions for improvement wasn t well received. So if you need help with improving Rust packages in Debian, please talk to the Debian Rust Team yourself. Conclusions Embedded programming is still rather a mess and probably always will be. Embedded build systems can be bizarre. Documentation is scant. You re often expected to download board support packages full of mystery binaries, from the board vendor (or others). Dev tooling is maddening, especially if aimed at novice programmers. You want version control? Hermetic tracking of your project s build and install configuration? Actually to be told by the compiler when you write obvious bugs? You re way off the beaten track. As ever, Free Software is under-resourced and the maintainers are often busy, or (reasonably) have other things to do with their lives. All is not lost Rust can be a significantly better bet than C for embedded software: The Rust compiler will catch a good proportion of programming errors, and an experienced Rust programmer can arrange (by suitable internal architecture) to catch nearly all of them. When writing for a chip in the middle of some circuit, where debugging involves staring an LED or a multimeter, that s precisely what you want. Rust embedded dev tooling was, in this case, considerably better. Still quite chaotic and strange, and less mature, perhaps. But: significantly fewer mystery downloads, and significantly less crazy deviations from the language s normal build system. Overall, less bad software supply chain integrity. The ATTiny85 chip, and the DigiSpark board, served my hardware needs very well. (More about the hardware aspects of this project in a future posting.)

comments

Welcome to the August 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to the project, please visit our Contribute page on our website.

Rust serialisation library moving to precompiled binaries Bleeping Computer reported that Serde, a popular Rust serialization framework, had decided to ship its `serde_derive` macro as a precompiled binary. As Ax Sharma writes:
The move has generated a fair amount of push back among developers who worry about its future legal and technical implications, along with a potential for supply chain attacks, should the maintainer account publishing these binaries be compromised.
After intensive discussions, use of the precompiled binary was phased out.

Reproducible builds, the first ten years On August 4th, Holger Levsen gave a talk at BornHack 2023 on the Danish island of Funen titled Reproducible Builds, the first ten years which promised to contain:
[ ] an overview about reproducible builds, the past, the presence and the future. How it started with a small [meeting] at DebConf13 (and before), how it grew from being a Debian effort to something many projects work on together, until in 2021 it was mentioned in an executive order of the president of the United States. (HTML slides)
Holger repeated the talk later in the month at Chaos Communication Camp 2023 in Zehdenick, Germany: A video of the talk is available online, as are the HTML slides.

Reproducible Builds Summit Just another reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page, the news item, or the invitation email that Mattia Rizzolo sent out, which have more details about the event and location. We are also still looking for sponsors to support the event, so do reach out to the organizing team if you are able to help. (Also of note that PackagingCon 2023 is taking place in Berlin just before our summit, and their schedule has just been published.)

Vagrant Cascadian on the Sustain podcast Vagrant Cascadian was interviewed on the SustainOSS podcast on reproducible builds:
Vagrant walks us through his role in the project where the aim is to ensure identical results in software builds across various machines and times, enhancing software security and creating a seamless developer experience. Discover how this mission, supported by the Software Freedom Conservancy and a broad community, is changing the face of Linux distros, Arch Linux, openSUSE, and F-Droid. They also explore the challenges of managing random elements in software, and Vagrant s vision to make reproducible builds a standard best practice that will ideally become automatic for users. Vagrant shares his work in progress and their commitment to the last mile problem.
The episode is available to listen (or download) from the Sustain podcast website. As it happens, the episode was recorded at FOSSY 2023, and the video of Vagrant s talk from this conference (Breaking the Chains of Trusting Trust is now available on Archive.org: It was also announced that Vagrant Cascadian will be presenting at the Open Source Firmware Conference in October on the topic of Reproducible Builds All The Way Down.

On our mailing list Carles Pina i Estany wrote to our mailing list during August with an interesting question concerning the practical steps to reproduce the `hello-traditional` package from Debian. The entire thread can be viewed from the archive page, as can Vagrant Cascadian s reply.

Website updates Rahul Bajaj updated our website to add a series of environment variations related to reproducible builds [ ], Russ Cox added the Go programming language to our projects page [ ] and Vagrant Cascadian fixed a number of broken links and typos around the website [ ][ ][ ].

Software development In diffoscope development this month, versions `247`, `248` and `249` were uploaded to Debian unstable by Chris Lamb, who also added documentation for the new `specialize_as` method and expanding the documentation of the existing `specialize` as well [ ]. In addition, Fay Stegerman added `specialize_as` and used it to optimise `.smali` comparisons when decompiling Android `.apk` files [ ], Felix Yan and Mattia Rizzolo corrected some typos in code comments [ , ], Greg Chabala merged the RUN commands into single layer in the package s `Dockerfile` [ ] thus greatly reducing the final image size. Lastly, Roland Clobus updated tool descriptions to mark that the `xb-tool` has moved package within Debian [ ].
reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Vagrant Cascadian updated the packaging to be compatible with Tox version 4. This was originally filed as Debian bug #1042918 and Holger Levsen uploaded this to change to Debian unstable as version 0.7.26 [ ].

Distribution work In Debian, 28 reviews of Debian packages were added, 14 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types were added, including Chris Lamb adding a new `timestamp_in_documentation_using_sphinx_zzzeeksphinx_theme` toolchain issue.
In August, F-Droid added 25 new reproducible apps and saw 2 existing apps switch to reproducible builds, making 191 apps in total that are published with Reproducible Builds and using the upstream developer s signature. [ ]
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:

`arimo` (modification time in build results)

`apptainer` (random Go build identifier)

`arrow` (fails to build on single-CPU machines)

`camlp` (parallelism-related issue)

`developer` (Go ordering-related issue)

`elementary-xfce-icon-theme` (font-related problem)

`gegl` (parallelism issue)

`grommunio` (filesystem ordering issue)

`grpc` (drop nondetermistic log)

`guile-parted` (parallelism-related issue)

`icinga` (hostname-based issue)

`liquid-dsp` (CPU-oriented problem)

`memcached` (package fails to build far in the future)

`openmpi5/openpmix` (date/copyright year issue)

`openmpi5` (date/copyright year issue)

`orthanc-ohif+orthanc-volview` (ordering related issue plus timestamp in a Gzip)

`perl-Net-DNS` (package fails to build far in the future)

`postgis` (parallelism issue)

`python-scipy` (uses an arbitrary build path)

`python-trustme` (package fails to build far in the future)

`qtbase/qmake/goldendict-ng` (timestamp-related issue)

`qtox` (date-related issue)

`ring` (filesytem ordering related issue)

`scipy` (1 & 2) (drop arbtirary build path and filesytem-ordering issue)

`snimpy` (1 & 3) (fails to build on single-CPU machines as well far in the future)

`tango-icon-theme` (font-related issue)

Chris Lamb:

#1042954 filed against `libcerf`.

#1042955 filed against `zzzeeksphinx`.

#1043330 filed against `tox`.

#1050357 filed against `pytds`.

#1050727 filed against `zlib`.

#1050944 filed against `jtreg6`.

#1050955 filed against `rpy2`.

Rebecca N. Palmer:

#1050619 filed against `nbclient`.

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In August, a number of changes were made by Holger Levsen:

Debian-related changes:

Disable Debian live image creation jobs until an OpenQA credential problem has been fixed. [ ]

Run our maintenance scripts every 3 hours instead of every 2. [ ]

Export data for unstable to the `reproducible-tracker.json` data file. [ ]

Stop varying the build path, we want reproducible builds. [ ]

Temporarily stop updating the `pbuilder.tgz` for Debian unstable due to #1050784. [ ][ ]

Correctly document that we are not variying `usrmerge`. [ ][ ]

Mark two `armhf` nodes (`wbq0` and `jtx1a`) as down; investigation is needed. [ ]

Misc:

Force reconfiguration of all Jenkins jobs, due to the recent rise of zombie processes. [ ]

In the node health checks, also try to restart failed `ntpsec`, `postfix` and `vnstat` services. [ ][ ][ ]

System health checks:

Detect Debian live build failures due to missing credentials. [ ][ ]

Ignore specific types of known zombie processes. [ ][ ]

In addition, Vagrant Cascadian updated the scripts to use a predictable build path that is consistent with the one used on `buildd.debian.org`. [ ][ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Mailing list: `rb-general@lists.reproducible-builds.org`

Mastodon: @reproducible_builds@fosstodon.org

Twitter: @ReproBuilds

This is a short announcement to say that I have changed my main OpenPGP key. A signed statement is available with the cryptographic details but, in short, the reason is that I stopped using my old YubiKey NEO that I have worn on my keyring since 2015. I now have a YubiKey 5 which supports ED25519 which features much shorter keys and faster decryption. It allowed me to move all my secret subkeys on the key (including encryption keys) while retaining reasonable performance. I have written extensive documentation on how to do that OpenPGP key rotation and also YubiKey OpenPGP operations.

Warning on storing encryption keys on a YubiKey People wishing to move their private encryption keys to such a security token should be very careful as there are special precautions to take for disaster recovery. I am toying with the idea of writing an article specifically about disaster recovery for secrets and backups, dealing specifically with cases of death or disabilities.

Autocrypt changes One nice change is the impact on Autocrypt headers, which are considerably shorter. Before, the header didn't even fit on a single line in an email, it overflowed to five lines:

Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xsFNBEogKJ4BEADHRk8dXcT3VmnEZQQdiAaNw8pmnoRG2QkoAvv42q9Ua+DRVe/yAEUd03EOXbMJl++YKWpVuzSFr7IlZ+/lJHOCqDeSsBD6LKBSx/7uH2EOIDizGwfZNF3u7X+gVBMy2V7rTClDJM1eT9QuLMfMakpZkIe2PpGE4g5zbGZixn9er+wEmzk2mt20RImMeLK3jyd6vPb1/Ph9+bTEuEXi6/WDxJ6+b5peWydKOdY1tSbkWZgdi+Bup72DLUGZATE3+Ju5+rFXtb/1/po5dZirhaSRZjZA6sQhyFM/ZhIj92mUM8JJrhkeAC0iJejn4SW8ps2NoPm0kAfVu6apgVACaNmFb4nBAb2k1KWru+UMQnV+VxDVdxhpV628Tn9+8oDg6c+dO3RCCmw+nUUPjeGU0k19S6fNIbNPRlElS31QGL4H0IazZqnE+kw6ojn4Q44h8u7iOfpeanVumtp0lJs6dE2nRw0EdAlt535iQbxHIOy2x5m9IdJ6q1wWFFQDskG+ybN2Qy7SZMQtjjOqM+CmdeAnQGVwxowSDPbHfFpYeCEb+Wzya337Jy9yJwkfa+V7e7Lkv9/OysEsV4hJrOh8YXu9a4qBWZvZHnIO7zRbz7cqVBKmdrL2iGqpEUv/x5onjNQwpjSVX5S+ZRBZTzah0w186IpXVxsU8dSk0yeQskblrwARAQABzSlBbnRvaW5lIEJlYXVwcsOpIDxhbmFyY2F0QHRvcnByb2plY3Qub3JnPsLBlAQTAQgAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIXgBYhBI3JAc5kFGwEitUPu3khUlJ7dZIeBQJihnFIBQkacFLiAAoJEHkhUlJ7dZIeXNAP/RsX+27l9K5uGspEaMH6jabAFTQVWD8Ch1om9YvrBgfYtq2k/m4WlkMh9IpT89Ahmlf0eq+V1Vph4wwXBS5McK0dzoFuHXJa1WHThNMaexgHhqJOs
 S60bWyLH4QnGxNaOoQvuAXiCYV4amKl7hSuDVZEn/9etDgm/UhGn2KS3yg0XFsqI7V/3RopHiDT+k7+zpAKd3st2V74w6ht+EFp2Gj0sNTBoCdbmIkRhiLyH9S4B+0Z5dUCUEopGIKKOSbQwyD5jILXEi7VTZhN0CrwIcCuqNo7OXI6e8gJd8McymqK4JrVoCipJbLzyOLxZMxGz8Ki0b9O844/DTzwcYcg9I1qogCsGmZfgVze2XtGxY+9zwSpeCLeef6QOPQ0uxsEYSfVgS+onCesSRCgwAPmppPiva+UlGuIMun87gPpQpV2fqFg/V8zBxRvs6YTGcfcQjfMoBHmZTGb+jk1//QAgnXMO7fGG38YH7iQSSzkmodrH2s27ZKgUTHVxpBL85ptftuRqbR7MzIKXZsKdA88kjIKKXwMmez9L1VbJkM4k+1Kzc5KdVydwi+ujpNegF6ZU8KDNFiN9TbDOlRxK5R+AjwdS8ZOIa4nci77KbNF9OZuO3l/FZwiKp8IFJ1nK7uiKUjmCukL0od/6X2rJtAzJmO5Co93ZVrd5r48oqUvjklzzsBNBFmeC3oBCADEV28RKzbv3dEbOocOsJQWr1R0EHUcbS270CrQZfb9VCZWkFlQ/1ypqFFQSjmmUGbNX2CG5mivVsW6Vgm7gg8HEnVCqzL02BPY4OmylskYMFI5Bra2wRNNQBgjg39L9XU4866q3BQzJp3r0fLRVH8gHM54Jf0FVmTyHotR/Xiw5YavNy2qaQXesqqUv8HBIha0rFblbuYI/cFwOtJ47gu0QmgrU0ytDjlnmDNx4rfsNylwTIHS0Oc7Pezp7MzLmZxnTM9b5VMprAXnQr4rewXCOUKBSto+j4rD5/77DzXw96bbueNruaupb2Iy2OHXNGkB0vKFD3xHsXE2x75NBovtABEBAAHCwqwEGAEIACAWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWZ4LegIbAgFACRB5IV
 JSe3WSHsB0IAQZAQgAHRYhBHsWQgTQlnI7AZY1qz6h3d2yYdl7BQJZngt6AAoJED6h3d2yYdl7CowH/Rp7GHEoPZTSUK8Ss7crwRmuAIDGBbSPkZbGmm4bOTaNs/gealc2tsVYpoMx7aYgqUW+t+84XciKHT+bjRv8uBnHescKZgDaomDuDKc2JVyx6samGFYuYPcGFReRcdmH0FOoPCn7bMW5mTPztV/wIA80LZD9kPKIXanfUyI3HLP0BPwZG4WTpKzJaalR1BNwu2oF6kEK0ymH3LfDiJ5Sr6emI2jrm4gH+/19ux/x+ST4tvm2PmH3BSQOPzgiqDiFd7RZoAIhmwr3FW4epsK9LtSxsi9gZ2vATBKO1oKtb6olW/keQT6uQCjqPSGojwzGRT2thEANH+5t6Vh0oDPZhrKUXRAAxHMBNHEaoo/M0sjZo+5OF3Ig1rMnI6XbKskLv6hu13cCymW0w/5E4XuYnyQ1cNC3pLvqDQbDx5mAPfBVHuqxJdRLQ3yDM/D2QIsxnkzQwi0FsJuni4vuJzWK/NHHDCvxMCh0YmSgbptUtgW8/niatd2Y6MbfRGxUHoctKtzqzivC8hKMTFrj4AbZhg/e9QVCsh5zSXtpWP0qFDJsxRMx0/432n9d4XUiy4U672r9Q09SsynB3QN6nTaCTWCIxGxjIb+8kJrRqTGwy/PElHX6kF0vQUWZNf2ITV1sd6LK/s/7sH+x4rzgUEHrsKr/qPvY3rUY/dQLd+owXesY83ANOu6oMWhSJnPMksbNa4tIKKbjmw3CFIOfoYHOWf3FtnydHNXoXfj4nBX8oSnkfhLILTJgf6JDFXfw6mTsv/jMzIfDs7PO1LK2oMK0+prSvSoM8bP9dmVEGIurzsTGjhTOBcb0zgyCmYVD3S48vZlTgHszAes1zwaCyt3/tOwrzU5JsRJVns+B/TUYaR/u3oIDMDygvE5ObWxXaFVnCC59r+zl0FazZ0ouyk2AYIR
 zHf+n1n98HCngRO4FRel2yzGDYO2rLPkXRm+NHCRvUA/i4zGkJs2AV0hsKK9/x8uMkBjHAdAheXhY+CsizGzsKjjfwvgqf84LwAzSDdZqLVE2yGTOwU0ESiArJwEQAJhtnC6pScWjzvvQ6rCTGAai6hrRiN6VLVVFLIMaMnlUp92EtgVSNpw6kANtRTpKXUB5fIPZVUrVdfEN06t96/6LE42tgifDAFyFTZY5FdHHri1GG/Cr39MpW2VqCDCtTTPVWHTUlU1ZG631BJ+9NB+ce58TmLr6wBTQrT+W367eRFBC54EsLNb7zQAspCn9pw1xf1XNHOGnrAQ4r9BXhOW5B8CzRd4nLRQwVgtw/c5M/bjemAOoq2WkwN+0mfJe4TSfHwFUozXuN274X+0Gr10fhp8xEDYuQM0qu6W3aDXMBBwIu0jTNudEELsTzhKUbqpsBc9WjwNMCZoCuSw/RTpFBV35mXbqQoQgbcU7uWZslLl9Wvv/C6rjXgd+GeX8SGBjTqq1ZkTv5UXLHTNQzPnbkNEExzqToi/QdSjFMIACnakeOSxc0ckfnsd9pfGv1PUyPyiwrHiqWFzBijzGIZEHxhNGFxAkXwTJR7Pd40a7RDxwbO6p/TSIIum41JtteehLHwTRDdQNMoyfLxuNLEtNYS0uR2jYI1EPQfCNWXCdT2ZK/l6GVP6jyB/olHBIOr+oVXqJh+48ki8cATPczhq3fUr7UivmguGwD67/4omZ4PCKtz1hNndnyYFS9QldEGo+AsB3AoUpVIA0XfQVkxD9IZr+Zu6aJ6nWq4M2bsoxABEBAAHCwXYEGAEIACACGwwWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWPerZAAKCRB5IVJSe3WSHkIgEACTpxdn/FKrwH0/LDpZDTKWEWm4416l13RjhSt9CUhZ/Gm2GNfXcVTfoF/jKXXgjHcV1DHjfLUPmPVwMdqlf5ACOiFqIUM2ag/OEARh356w
 YG7YEobMjX0CThKe6AV2118XNzRBw/S2IO1LWnL5qaGYPZONUa9Pj0OaErdKIk/V1wge8Zoav2fQPautBcRLW5VA33PH1ggoqKQ4ES1hc9HC6SYKzTCGixu97mu/vjOa8DYgM+33TosLyNy+bCzw62zJkMf89X0tTSdaJSj5Op0SrRvfgjbC2YpJOnXxHr9qaXFbBZQhLjemZi6zRzUNeJ6A3Nzs+gIc4H7s/bYBtcd4ugPEhDeCGffdS3TppH9PnvRXfoa5zj5bsKFgjqjWolCyAmEvd15tXz5yNXtvrpgDhjF5ozPiNp/1EeWX4DxbH2i17drVu4fXwauFZ6lcsAcJxnvCA28RlQlmEQu/gFOx1axVXf6GIuXnQSjQN6qJbByUYrdc/cFCxPO2/lGuUxnufN9Tvb51Qh54laPgGLrlD2huQeSD9Sxa0MNUjNY0qLqaReT99Ygb2LPYGSLoFVx9iZz6sZNt07LqCx9qNgsJwsdmwYsNpMuFbc7nkWjtlEqzsXZHTvYN654p43S+hcAhmmOzQZcew6h71fAJLciiqsPBnCEdgCGFAWhZZdPkMA==

After the change, the entire key fits on a single line, neat!

Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xjMEZHZPzhYJKwYBBAHaRw8BAQdAWdVzOFRW6FYVpeVaDo3sC4aJ2kUW4ukdEZ36UJLAHd7NKUFudG9pbmUgQmVhdXByw6kgPGFuYXJjYXRAdG9ycHJvamVjdC5vcmc+wpUEExYIAD4WIQS7ts1MmNdOE1inUqYCKTpvpOU0cwUCZHZgvwIbAwUJAeEzgAULCQgHAwUVCgkICwUWAgMBAAIeAQIXgAAKCRACKTpvpOU0c47SAPdEqfeHtFDx9UPhElZf7nSM69KyvPWXMocu9Kcu/sw1AQD5QkPzK5oxierims6/KUkIKDHdt8UcNp234V+UdD/ZB844BGR2UM4SCisGAQQBl1UBBQEBB0CYZha2IMY54WFXMG4S9/Smef54Pgon99LJ/hJ885p0ZAMBCAfCdwQYFggAIBYhBLu2zUyY104TWKdSpgIpOm+k5TRzBQJkdlDOAhsMAAoJEAIpOm+k5TRzBg0A+IbcsZhLx6FRIqBJCdfYMo7qovEo+vX0HZsUPRlq4HkBAIctCzmH3WyfOD/aUTeOF3tY+tIGUxxjQLGsNQZeGrQI

Note that I have implemented my own kind of ridiculous Autocrypt support for the Notmuch Emacs email client I use, see this elisp code. To import keys, I pipe the message into this script which is basically just:

sq autocrypt decode   gpg --import

... thanks to Sequoia best-of-class Autocrypt support.

Note on OpenPGP usage While some have claimed OpenPGP's death, I believe those are overstated. Maybe it's just me, but I still use OpenPGP for my password management, to authenticate users and messages, and it's the interface to my YubiKey for authenticating with SSH servers. I understand people feel that OpenPGP is possibly insecure, counter-intuitive and full of problems, but I think most of those problems should instead be attributed to its current flagship implementation, GnuPG. I have tried to work with GnuPG for years, and it keeps surprising me with evilness and oddities. I have high hopes that the Sequoia project can bring some sanity into this space, and I also hope that RFC4880bis can eventually get somewhere so we have a more solid specification with more robust crypto. It's kind of a shame that this has dragged on for so long, but Update: there's a separate draft called openpgp-crypto-refresh that might actually be adopted as the "OpenPGP RFC" soon! And it doesn't keep real work from happening in Sequoia and other implementations. Thunderbird rewrote their OpenPGP implementation with RNP (which was, granted, a bumpy road because it lost compatibility with GnuPG) and Sequoia now has a certificate store with trust management (but still no secret storage), preliminary OpenPGP card support and even a basic GnuPG compatibility layer. I'm also curious to try out the OpenPGP CA capabilities. So maybe it's just because I'm becoming an old fart that doesn't want to change tools, but so far I haven't seen a good incentive in switching away from OpenPGP, and haven't found a good set of tools that completely replace it. Maybe OpenSSH's keys and CA can eventually replace it, but I suspect they will end up rebuilding most of OpenPGP anyway, just more slowly. If they do, let's hope they avoid the mistakes our community has done in the past at least...

Welcome to the March 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, the motivation behind the reproducible builds effort is to ensure no malicious flaws have been introduced during compilation and distributing processes. It does this by ensuring identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to the project, please do visit our Contribute page on our website.

News There was progress towards making the Go programming language reproducible this month, with the overall goal remaining making the Go binaries distributed from Google and by Arch Linux (and others) to be bit-for-bit identical. These changes could become part of the upcoming version 1.21 release of Go. An issue in the Go issue tracker (#57120) is being used to follow and record progress on this.
Arnout Engelen updated our website to add and update reproducibility-related links for NixOS to reproducible.nixos.org. [ ]. In addition, Chris Lamb made some cosmetic changes to our presentations and resources page. [ ][ ]
Intel published a guide on how to reproducibly build their Trust Domain Extensions (TDX) firmware. TDX here refers to an Intel technology that combines their existing virtual machine and memory encryption technology with a new kind of virtual machine guest called a Trust Domain. This runs the CPU in a mode that protects the confidentiality of its memory contents and its state from any other software.
A reproducibility-related bug from early 2020 in the GNU GCC compiler as been fixed. The issues was that if GCC was invoked via the `as` frontend, the `-ffile-prefix-map` was being ignored. We were tracking this in Debian via the `build_path_captured_in_assembly_objects` issue. It has now been fixed and will be reflected in GCC version 13.
Holger Levsen will present at foss-north 2023 in April of this year in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years.
Anthony Andreoli, Anis Lounis, Mourad Debbabi and Aiman Hanna of the Security Research Centre at Concordia University, Montreal published a paper this month entitled On the prevalence of software supply chain attacks: Empirical study and investigative framework:
Software Supply Chain Attacks (SSCAs) typically compromise hosts through trusted but infected software. The intent of this paper is twofold: First, we present an empirical study of the most prominent software supply chain attacks and their characteristics. Second, we propose an investigative framework for identifying, expressing, and evaluating characteristic behaviours of newfound attacks for mitigation and future defense purposes. We hypothesize that these behaviours are statistically malicious, existed in the past, and thus could have been thwarted in modernity through their cementation x-years ago. [ ]

On our mailing list this month:

Mattia Rizzolo is asking everyone in the community to save the date for the 2023 s Reproducible Builds summit which will take place between October 31st and November 2nd at Dock Europe in Hamburg, Germany. Separate announcement(s) to follow. [ ]

ahojlm posted an message announcing a new project which is the first project offering bootstrappable and verifiable builds without any binary seeds. That is to say, a way of providing a verifiable path towards trusted software development platform without relying on pre-provided binary code in order to prevent against various forms of compiler backdoors. The project s homepage is hosted on Tor (mirror).

The minutes and logs from our March 2023 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 25th April at 15:00 UTC on `#reproducible-builds` on the OFTC network.
and as a Valentines Day present, Holger Levsen wrote on his blog on 14th February to express his thanks to OSUOSL for their continuous support of reproducible-builds.org. [ ]

Debian Vagrant Cascadian developed an easier setup for testing debian packages which uses sbuild s unshare mode along and reprotest, our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. [ ]
Over 30 reviews of Debian packages were added, 14 were updated and 7 were removed this month, all adding to our knowledge about identified issues. A number of issues were updated, including the Holger Levsen updating `build_path_captured_in_assembly_objects` to note that it has been fixed for GCC 13 [ ] and Vagrant Cascadian added new issues to mark packages where the build path is being captured via the Rust toolchain [ ] as well as new categorisation for where virtual packages have nondeterministic versioned dependencies [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:

`cockpit` (gzip mtime)

`crmsh` (by mcepl: rewrite to avoid python toolchain issue)

`cx_Freeze` (merged, FTBFS-2038)

`golangci-lint` (date)

`guestfs-tools` (gzip mtime)

`perf` (merged, sort python scandir)

`perl-Date-Calc-XS` (FTBFS-2038)

`perl-Date-Calc` (FTBFS-2038)

`pw3270` (merged, date)

`python-dtaidistance` (drop unreproducible unnecessary file)

`sonic-pi` (FTBFS-2038)

`spack` (parallelism)

`tesseract` (fixed, CPU, -march=native)

Chris Lamb:

#1032409 filed against `esda`.

#1032759 filed against `gle-graphics-manual`.

Stefan Br ns:

`transfig/fig2dev` (also in openSUSE ; date in PDF)

Vagrant Cascadian:

#1033032 filed against `buddy`.

#1033089 filed against `subread`.

#1033663 filed against `linux`.

In addition, Vagrant Cascadian filed a bug with a patch to ensure GNU Modula-2 supports the `SOURCE_DATE_EPOCH` environment variable.

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In March, the following changes were made by Holger Levsen:

Arch Linux-related changes:

Build Arch packages in `/tmp/archlinux-ci/$SRCPACKAGE` instead of `/tmp/$SRCPACKAGE`. [ ]

Start 2/3 of the builds on the `o1` node, the rest on `o2`. [ ]

Add graphs for Arch Linux (and OpenWrt) builds. [ ]

Toggle Arch-related builders to debug why a specific node overloaded. [ ][ ][ ][ ]

Node health checks:

Detect `SetuptoolsDeprecationWarning` tracebacks in Python builds. [ ]

Detect failures do perform `chdist` calls. [ ][ ]

OSUOSL node migration.

Install `megacli` packages that are needed for hardware RAID. [ ][ ]

Add health check and maintenance jobs for new nodes. [ ]

Add mail config for new nodes. [ ][ ]

Handle a node running in the future correctly. [ ][ ]

Migrate some nodes to Debian bookworm. [ ]

Fix nodes health overview for osuosl3. [ ]

Make sure the `/srv/workspace` directory is owned by by the `jenkins` user. [ ]

Use `.debian.net` names everywhere, except when communicating with the outside world. [ ]

Grant fpierret access to a new node. [ ]

Update documentation. [ ]

Misc migration changes. [ ][ ][ ][ ][ ][ ][ ][ ]

Misc changes:

Enable fail2ban everywhere and monitor it with munin [ ].

Gracefully deal with non-existing Alpine schroots. [ ]

In addition, Roland Clobus is continuing his work on reproducible Debian ISO images:

Add/update openQA configuration [ ], and use the actual timestamp for openQA builds [ ].

Moved adding the user to the `docker` group from the `janitor_setup_worker` script to the (more general) `update_jdn.sh` script. [ ]

Use the (short-term) reproducible source when generating `live-build` images. [ ]

diffoscope development diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats as well. This month, Mattia Rizzolo released versions `238`, and Chris Lamb released versions `239` and `240`. Chris Lamb also made the following changes:

Fix compatibility with PyPDF 3.x, and correctly restore test data. [ ]

Rework PDF annotation handling into a separate method. [ ]

In addition, Holger Levsen performed a long-overdue overhaul of the Lintian overrides in the Debian packaging [ ][ ][ ][ ], and Mattia Rizzolo updated the packaging to silence an `include_package_data=True` [ ], fixed the build under Debian bullseye [ ], fixed tool name in a list of tools permitted to be absent during package build tests [ ] and as well as documented sending out an email upon [ ]. In addition, Vagrant Cascadian updated the version of GNU Guix to 238 [ and 239 [ ]. Vagrant also updated reprotest to version 0.7.23. [ ]

Other development work Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Twitter: @ReproBuilds

Mailing list: `rb-general@lists.reproducible-builds.org`

Today I noticed bounces in my mail box. They were from ikiwiki trying to send registration confirmation email to users who probably never asked for it. I'm getting truly fed up with spam in my wiki. At this point, all comments are manually approved and I still get trouble: now it's scammers spamming the registration form with dummy accounts, which bounce back to me when I make new posts, or just generate backscatter spam for the confirmation email. It's really bad. I have hundreds of users registered on my blog, and I don't know which are spammy, which aren't. So. I'm considering ditching ikiwiki comments altogether. I am testing Mastodon as a commenting platforms. Others (e.g. JAK) have implemented this as a server but a simpler approach is toload them dynamically from Mastodon, which is what Carl Shwan has done. They are using Hugo, however, so they can easily embed page metadata in the template to load the right server with the right comment ID. I wasn't sure how to do this in ikiwiki: it's typically hard to access page-specific metadata in templates. Even the page name is not there, for example. I have tried using templates, and that (obviously?) fails because the <script> stuff gets sanitized away. It seems I would need to split the JavaScript out of the template into a base template and then make the page template refer to a function in there. It's kind of horrible and messy. I wish there was a way to just access page metadata from the page template itself... I found out the meta plugin passes along its metadata, but that's not (easily) extensible. So i'd need to either patch that module, and my history of merged patches is not great so far. So: another plugin. I have something that kind of works that's a combination of a page.tmpl patch and a plugin. The plugin adds a mastodon directive that feeds the page.tmpl with the right stuff. On clicking a button, it injects comments from the Mastodon API, with a JavaScript callback. It's not pretty (it's not themed at all!), but it works. If you want to do this at home, you need this page.tmpl (or at least this patch and that one) and the mastodon.pm plugin from my mastodon-plugin branch. I'm not sure this is a good idea. The first test I did was a "test comment" which led to half a dozen "test reply". I then realized I couldn't redact individual posts from there. I don't even know if, when I mute a user, it actually gets hidden from everyone else too... So I'll test this for a while, I guess. I have also turned off all CGI on this site. It will keep users from registering while I cleanup this mess and think about next steps. I have other options as well if push comes to shove, but I'm unlikely to go back to ikiwiki comments. Mastodon comments are nice because they don't require me to run any extra software: either I have my own federated service I reuse, or I use someone else's, but I don't need to run something extra. And, of course, comments are published in a standard way that's interoperable with everything... On the other hand, now I won't have comments enabled until the blog is posted on Mastodon... Right now this happens only when feed2exec runs and the HTTP cache expires, which can take up to a day. I should probably do this some other way, like flush the cache when a new post arrives, or run post-commit hooks, but for now, this will have to do. Update: I figured out a way to make this work in a timely manner:

there's a post-merge hook in my ikiwiki git repository which calls feed2exec in /home/w-anarcat/source/.git/hooks/ took me a while to find it! I tried post-update and post-receive first, but ikiwiki actually pulls from the bare directory in the source directory, so only post-merge fires (even though it's not a merge)
feed2exec then finds new blog posts (if any!) and fires up the new ikiwikitoot plugin which then...
posts the toot using the toot command (it just works, why reinvent the wheel), keeping the toot URL
finds the Markdown source file associated with the post, and adds the magic mastodon directive
commits and pushes the result

This will make the interaction with Mastodon much smoother: as soon as a blog post is out of "draft" (i.e. when it hits the RSS feeds), this will immediately trigger and post the blog entry to Mastodon, enabling comments. It's kind of a tangled mess of stuff, but it works! I have briefly considered not using feed2exec for this, but it turns out it does an important job of parsing the result of ikiwiki's rendering. Otherwise I would have to guess which post is really a blog post, is this just an update or is it new, is it a draft, and so on... all sorts of questions where the business logic already resides in ikiwiki, and that I would need to reimplement myself. Plus it goes alongside moving more stuff (like my feed reader) to dedicated UNIX accounts (in this case, the blog sandbox) for security reasons. Whee!

I use GnuPG to compute cryptographic signatures for my emails, git commits/tags, and software release artifacts (tarballs). Part of GnuPG is gpg-agent which talks to OpenSSH, which I login to remote servers and to clone git repositories. I dislike storing cryptographic keys on general-purpose machines, and have used hardware-backed OpenPGP keys since around 2006 when I got a FSFE Fellowship Card. GnuPG via gpg-agent handles this well, and the private key never leaves the hardware. The ZeitControl cards were (to my knowledge) proprietary hardware running some non-free operating system and OpenPGP implementation. By late 2012 the YubiKey NEO supported OpenPGP, and while the hardware and operating system on it was not free, at least it ran a free software OpenPGP implementation and eventually I setup my primary RSA key on it. This worked well for a couple of years, and when I in 2019 wished to migrate to a new key, the FST-01G device with open hardware running free software that supported Ed25519 had become available. I created a key and have been using the FST-01G on my main laptop since then. This little device has been working, the signature counter on it is around 14501 which means around 10 signatures/day since then! Currently I am in the process of migrating towards a new laptop, and moving the FST-01G device between them is cumbersome, especially if I want to use both laptops in parallel. That s why I need to setup a new hardware device to hold my OpenPGP key, which can go with my new laptop. This is a good time to re-visit alternatives. I quickly decided that I did not want to create a new key, only to import my current one to keep everything working. My requirements on the device to chose hasn t changed since 2019, see my summary at the end of the earlier blog post. Unfortunately the FST-01G is out of stock and the newer FST-01SZ has also out of stock. While Tillitis looks promising (and I have one to play with), it does not support OpenPGP (yet). What to do? Fortunately, I found some FST-01SZ device in my drawer, and decided to use it pending a more satisfactory answer. Hopefully once I get around to generate a new OpenPGP key in a year or so, I will do a better survey of options that are available on the market then. What are your (freedom-respecting) OpenPGP hardware recommendations?

Similar to setting up the FST-01G, the FST-01SZ needs to be setup before use. I m doing the following from Trisquel 11 but any GNU/Linux system would work. When the device is inserted at first time, some kernel messages are shown (see /var/log/syslog or use the dmesg command):


usb 3-3: new full-speed USB device number 39 using xhci_hcd
usb 3-3: New USB device found, idVendor=234b, idProduct=0004, bcdDevice= 2.00
usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-3: Product: Fraucheky
usb 3-3: Manufacturer: Free Software Initiative of Japan
usb 3-3: SerialNumber: FSIJ-0.0
usb-storage 3-3:1.0: USB Mass Storage device detected
scsi host1: usb-storage 3-3:1.0
scsi 1:0:0:0: Direct-Access     FSIJ     Fraucheky        1.0  PQ: 0 ANSI: 0
sd 1:0:0:0: Attached scsi generic sg2 type 0
sd 1:0:0:0: [sdc] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
sd 1:0:0:0: [sdc] Write Protect is off
sd 1:0:0:0: [sdc] Mode Sense: 03 00 00 00
sd 1:0:0:0: [sdc] No Caching mode page found
sd 1:0:0:0: [sdc] Assuming drive cache: write through
 sdc:
sd 1:0:0:0: [sdc] Attached SCSI removable disk

Interestingly, the NeuG software installed on the device I got appears to be version 1.0.9:


jas@kaka:~$ head /media/jas/Fraucheky/README
NeuG - a true random number generator implementation
						  Version 1.0.9
						     2018-11-20
					           Niibe Yutaka
			      Free Software Initiative of Japan
What's NeuG?
============
jas@kaka:~$

I could not find version 1.0.9 published anywhere, but the device came with a SD-card that contain a copy of the source, so I uploaded it until a more canonical place is located. Putting the device in the serial mode can be done using a sudo eject /dev/sdc command which results in the following syslog output.


usb 3-3: reset full-speed USB device number 39 using xhci_hcd
usb 3-3: device firmware changed
usb 3-3: USB disconnect, device number 39
sdc: detected capacity change from 128 to 0
usb 3-3: new full-speed USB device number 40 using xhci_hcd
usb 3-3: New USB device found, idVendor=234b, idProduct=0001, bcdDevice= 2.00
usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-3: Product: NeuG True RNG
usb 3-3: Manufacturer: Free Software Initiative of Japan
usb 3-3: SerialNumber: FSIJ-1.0.9-42315277
cdc_acm 3-3:1.0: ttyACM0: USB ACM device

Now download Gnuk, verify its integrity and build it. You may need some additional packages installed, try apt-get install gcc-arm-none-eabi openocd python3-usb. As you can see, I m using the stable 1.2 branch of Gnuk, currently on version 1.2.20. The ./configure parameters deserve some explanation. The kdf_do=required sets up the device to require KDF usage. The --enable-factory-reset allows me to use the command factory-reset (with admin PIN) inside gpg --card-edit to completely wipe the card. Some may consider that too dangerous, but my view is that if someone has your admin PIN it is game over anyway. The --vidpid=234b:0000 is specifies the USB VID/PID to use, and --target=FST_01SZ is critical to set the platform (you ll may brick the device if you pick the wrong --target setting).


jas@kaka:~/src$ rm -rf gnuk neug
jas@kaka:~/src$ git clone https://gitlab.com/jas/neug.git
Cloning into 'neug'...
remote: Enumerating objects: 2034, done.
remote: Counting objects: 100% (2034/2034), done.
remote: Compressing objects: 100% (603/603), done.
remote: Total 2034 (delta 1405), reused 2013 (delta 1405), pack-reused 0
Receiving objects: 100% (2034/2034), 910.34 KiB   3.50 MiB/s, done.
Resolving deltas: 100% (1405/1405), done.
jas@kaka:~/src$ git clone https://salsa.debian.org/gnuk-team/gnuk/gnuk.git
Cloning into 'gnuk'...
remote: Enumerating objects: 13765, done.
remote: Counting objects: 100% (959/959), done.
remote: Compressing objects: 100% (337/337), done.
remote: Total 13765 (delta 629), reused 907 (delta 599), pack-reused 12806
Receiving objects: 100% (13765/13765), 12.59 MiB   3.05 MiB/s, done.
Resolving deltas: 100% (10077/10077), done.
jas@kaka:~/src$ cd neug
jas@kaka:~/src/neug$ git describe 
release/1.0.9
jas@kaka:~/src/neug$ git tag -v  git describe 
object 5d51022a97a5b7358d0ea62bbbc00628c6cec06a
type commit
tag release/1.0.9
tagger NIIBE Yutaka <gniibe@fsij.org> 1542701768 +0900
Version 1.0.9.
gpg: Signature made Tue Nov 20 09:16:08 2018 CET
gpg:                using EDDSA key 249CB3771750745D5CDD323CE267B052364F028D
gpg:                issuer "gniibe@fsij.org"
gpg: Good signature from "NIIBE Yutaka <gniibe@fsij.org>" [unknown]
gpg:                 aka "NIIBE Yutaka <gniibe@debian.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 249C B377 1750 745D 5CDD  323C E267 B052 364F 028D
jas@kaka:~/src/neug$ cd ../gnuk/
jas@kaka:~/src/gnuk$ git checkout STABLE-BRANCH-1-2 
Branch 'STABLE-BRANCH-1-2' set up to track remote branch 'STABLE-BRANCH-1-2' from 'origin'.
Switched to a new branch 'STABLE-BRANCH-1-2'
jas@kaka:~/src/gnuk$ git describe
release/1.2.20
jas@kaka:~/src/gnuk$ git tag -v  git describe 
object 9d3c08bd2beb73ce942b016d4328f0a596096c02
type commit
tag release/1.2.20
tagger NIIBE Yutaka <gniibe@fsij.org> 1650594032 +0900
Gnuk: Version 1.2.20
gpg: Signature made Fri Apr 22 04:20:32 2022 CEST
gpg:                using EDDSA key 249CB3771750745D5CDD323CE267B052364F028D
gpg: Good signature from "NIIBE Yutaka <gniibe@fsij.org>" [unknown]
gpg:                 aka "NIIBE Yutaka <gniibe@debian.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 249C B377 1750 745D 5CDD  323C E267 B052 364F 028D
jas@kaka:~/src/gnuk/src$ git submodule update --init
Submodule 'chopstx' (https://salsa.debian.org/gnuk-team/chopstx/chopstx.git) registered for path '../chopstx'
Cloning into '/home/jas/src/gnuk/chopstx'...
Submodule path '../chopstx': checked out 'e12a7e0bb3f004c7bca41cfdb24c8b66daf3db89'
jas@kaka:~/src/gnuk$ cd chopstx
jas@kaka:~/src/gnuk/chopstx$ git describe
release/1.21
jas@kaka:~/src/gnuk/chopstx$ git tag -v  git describe 
object e12a7e0bb3f004c7bca41cfdb24c8b66daf3db89
type commit
tag release/1.21
tagger NIIBE Yutaka <gniibe@fsij.org> 1650593697 +0900
Chopstx: Version 1.21
gpg: Signature made Fri Apr 22 04:14:57 2022 CEST
gpg:                using EDDSA key 249CB3771750745D5CDD323CE267B052364F028D
gpg: Good signature from "NIIBE Yutaka <gniibe@fsij.org>" [unknown]
gpg:                 aka "NIIBE Yutaka <gniibe@debian.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg:          There is no indication that the signature belongs to the owner.
Primary key fingerprint: 249C B377 1750 745D 5CDD  323C E267 B052 364F 028D
jas@kaka:~/src/gnuk/chopstx$ cd ../src
jas@kaka:~/src/gnuk/src$ kdf_do=required ./configure --enable-factory-reset --vidpid=234b:0000 --target=FST_01SZ
Header file is: board-fst-01sz.h
Debug option disabled
Configured for bare system (no-DFU)
PIN pad option disabled
CERT.3 Data Object is NOT supported
Card insert/removal by HID device is NOT supported
Life cycle management is supported
Acknowledge button is supported
KDF DO is required before key import/generation
jas@kaka:~/src/gnuk/src$ make   less
jas@kaka:~/src/gnuk/src$ cd ../regnual/
jas@kaka:~/src/gnuk/regnual$ make   less
jas@kaka:~/src/gnuk/regnual$ cd ../../
jas@kaka:~/src$ sudo python3 neug/tool/neug_upgrade.py -f gnuk/regnual/regnual.bin gnuk/src/build/gnuk.bin
gnuk/regnual/regnual.bin: 4608
gnuk/src/build/gnuk.bin: 109568
CRC32: b93ca829
Device: 
Configuration: 1
Interface: 1
20000e00:20005000
Downloading flash upgrade program...
start 20000e00
end   20002000
# 20002000: 32 : 4
Run flash upgrade program...
Wait 1 second...
Wait 1 second...
Device: 
08001000:08020000
Downloading the program
start 08001000
end   0801ac00
jas@kaka:~/src$

The kernel log will contain the following, and the card is ready to use as an OpenPGP card. You may unplug it and re-insert it as you wish.


usb 3-3: reset full-speed USB device number 41 using xhci_hcd
usb 3-3: device firmware changed
usb 3-3: USB disconnect, device number 41
usb 3-3: new full-speed USB device number 42 using xhci_hcd
usb 3-3: New USB device found, idVendor=234b, idProduct=0000, bcdDevice= 2.00
usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-3: Product: Gnuk Token
usb 3-3: Manufacturer: Free Software Initiative of Japan
usb 3-3: SerialNumber: FSIJ-1.2.20-42315277

Setting up the card is the next step, and there are many tutorials around for this, eventually I settled with the following sequence. Let s start with setting the admin PIN. First make sure that pcscd nor scdaemon is running, which is good hygien since those processes cache some information and with a stale connection this easily leads to confusion. Cache invalidation sigh.


jas@kaka:~$ gpg-connect-agent "SCD KILLSCD" "SCD BYE" /bye
jas@kaka:~$ ps auxww grep -e pcsc -e scd
jas        30221  0.0  0.0   3468  1692 pts/3    R+   11:49   0:00 grep --color=auto -e pcsc -e scd
jas@kaka:~$ gpg --card-edit
Reader ...........: 234B:0000:FSIJ-1.2.20-42315277:0
Application ID ...: D276000124010200FFFE423152770000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 42315277
Name of cardholder: [not set]
Language prefs ...: [not set]
Salutation .......: 
URL of public key : [not set]
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: off
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card> admin
Admin commands are allowed
gpg/card> kdf-setup
gpg/card> passwd
gpg: OpenPGP card no. D276000124010200FFFE423152770000 detected
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 3
PIN changed.
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection?

Now it would be natural to setup the PIN and reset code. However the Gnuk software is configured to not allow this until the keys are imported. You would get the following somewhat cryptical error messages if you try. This took me a while to understand, since this is device-specific, and some other OpenPGP implementations allows you to configure a PIN and reset code before key import.


Your selection? 4
Error setting the Reset Code: Card error
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 1
Error changing the PIN: Conditions of use not satisfied
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? q

Continue to configure the card and make it ready for key import. Some settings deserve comments. The lang field may be used to setup the language, but I have rarely seen it use, and I set it to sv (Swedish) mostly to be able to experiment if any software adhears to it. The URL is important to point to somewhere where your public key is stored, the fetch command of gpg --card-edit downloads it and sets up GnuPG with it when you are on a clean new laptop. The forcesig command changes the default so that a PIN code is not required for every digital signature operation, remember that I averaged 10 signatures per day for the past 2-3 years? Think of the wasted energy typing those PIN codes every time! Changing the cryptographic key type is required when I import 25519-based keys.


gpg/card> name
Cardholder's surname: Josefsson
Cardholder's given name: Simon
gpg/card> lang
Language preferences: sv
gpg/card> sex
Salutation (M = Mr., F = Ms., or space): m
gpg/card> login
Login data (account name): jas
gpg/card> url
URL to retrieve public key: https://josefsson.org/key-20190320.txt
gpg/card> forcesig
gpg/card> key-attr
Changing card key attribute for: Signature key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
Note: There is no guarantee that the card supports the requested size.
      If the key generation does not succeed, please check the
      documentation of your card to see what sizes are allowed.
Changing card key attribute for: Encryption key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: cv25519
Changing card key attribute for: Authentication key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
gpg/card> 
Reader ...........: 234B:0000:FSIJ-1.2.20-42315277:0
Application ID ...: D276000124010200FFFE423152770000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 42315277
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: on
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card>

The device is now ready for key import! Bring out your offline laptop and boot it and use the keytocard command on the subkeys to import them. This assumes you saved a copy of the GnuPG home directory after generating the master and subkeys before, which I did in my own previous tutorial when I generated the keys. This may be a bit unusual, and there are simpler ways to do this (e.g., import a copy of the secret keys into a fresh GnuPG home directory).


$ cp -a gnupghome-backup-mastersubkeys gnupghome-import-fst01sz-42315277-2022-12-24
$ ps auxww grep -e pcsc -e scd
$ gpg --homedir $PWD/gnupghome-import-fst01sz-42315277-2022-12-24 --edit-key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
...
Secret key is available.
gpg: checking the trustdb
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb  cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb  ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb  ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 1
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb* cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb  ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb  ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Please select where to store the key:
   (2) Encryption key
Your selection? 2
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb* cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb  ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb  ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 1
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb  cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb  ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb  ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 2

sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb  cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb* ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb  ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Please select where to store the key:
   (3) Authentication key
Your selection? 3
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb  cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb* ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb  ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 2
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb  cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb  ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb  ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 3
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb  cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb  ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb* ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Please select where to store the key:
   (1) Signature key
   (3) Authentication key
Your selection? 1
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
ssb  cv25519/02923D7EE76EBD60
     created: 2019-03-20  expired: 2019-10-22  usage: E   
ssb  ed25519/80260EE8A9B92B2B
     created: 2019-03-20  expired: 2019-10-22  usage: A   
ssb* ed25519/51722B08FE4745A2
     created: 2019-03-20  expired: 2019-10-22  usage: S   
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> quit
Save changes? (y/N) y
$

Now insert it into your daily laptop and have GnuPG and learn about the new private keys and forget about any earlier locally available card bindings this usually manifests itself by GnuPG asking you to insert a OpenPGP card with another serial number. Earlier I did rm -rf ~/.gnupg/private-keys-v1.d/ but the scd serialno followed by learn --force is nicer. I also sets up trust setting for my own key.


jas@kaka:~$ gpg-connect-agent "scd serialno" "learn --force" /bye
...
jas@kaka:~$ echo "B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE:6:"   gpg --import-ownertrust
jas@kaka:~$ gpg --card-status
Reader ...........: 234B:0000:FSIJ-1.2.20-42315277:0
Application ID ...: D276000124010200FFFE423152770000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 42315277
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 5 5 5
Signature counter : 3
KDF setting ......: on
Signature key ....: A3CC 9C87 0B9D 310A BAD4  CF2F 5172 2B08 FE47 45A2
      created ....: 2019-03-20 23:40:49
Encryption key....: A9EC 8F4D 7F1E 50ED 3DEF  49A9 0292 3D7E E76E BD60
      created ....: 2019-03-20 23:40:26
Authentication key: CA7E 3716 4342 DF31 33DF  3497 8026 0EE8 A9B9 2B2B
      created ....: 2019-03-20 23:40:37
General key info..: sub  ed25519/51722B08FE4745A2 2019-03-20 Simon Josefsson <simon@josefsson.org>
sec#  ed25519/D73CF638C53C06BE  created: 2019-03-20  expires: 2023-09-19
ssb>  ed25519/80260EE8A9B92B2B  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
ssb>  ed25519/51722B08FE4745A2  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
ssb>  cv25519/02923D7EE76EBD60  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
jas@kaka:~$

Verify that you can digitally sign and authenticate using the key and you are done!


jas@kaka:~$ echo foo gpg -a --sign gpg --verify
gpg: Signature made Sat Dec 24 13:49:59 2022 CET
gpg:                using EDDSA key A3CC9C870B9D310ABAD4CF2F51722B08FE4745A2
gpg: Good signature from "Simon Josefsson <simon@josefsson.org>" [ultimate]
jas@kaka:~$ ssh-add -L
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILzCFcHHrKzVSPDDarZPYqn89H5TPaxwcORgRg+4DagE cardno:FFFE42315277
jas@kaka:~$

So time to relax and celebrate christmas? Hold on not so fast! Astute readers will have noticed that the output said PIN retry counter: 5 5 5 . That s not the default PIN retry counter for Gnuk! How did that happen? Indeed, good catch and great question, my dear reader. I wanted to include how you can modify the Gnuk source code, re-build it and re-flash the Gnuk as well. This method is different than flashing Gnuk onto a device that is running NeuG so the commands I used to flash the firmware in the start of this blog post no longer works in a device running Gnuk. Fortunately modern Gnuk supports updating firmware by specifying the Admin PIN code only, and provides a simple script to achieve this as well. The PIN retry counter setting is hard coded in the openpgp-do.c file, and we run a a perl command to modify the file, rebuild Gnuk and upgrade the FST-01SZ. This of course wipes all your settings, so you will have the opportunity to practice all the commands earlier in this post once again!


jas@kaka:~/src/gnuk/src$ perl -pi -e 's/PASSWORD_ERRORS_MAX 3/PASSWORD_ERRORS_MAX 5/' openpgp-do.c
jas@kaka:~/src/gnuk/src$ make   less
jas@kaka:~/src/gnuk/src$ cd ../tool/
jas@kaka:~/src/gnuk/tool$ ./upgrade_by_passwd.py 
Admin password: 
Device: 
Configuration: 1
Interface: 0
../regnual/regnual.bin: 4608
../src/build/gnuk.bin: 110592
CRC32: b93ca829
Device: 
Configuration: 1
Interface: 0
20002a00:20005000
Downloading flash upgrade program...
start 20002a00
end   20003c00
Run flash upgrade program...
Waiting for device to appear:
  Wait 1 second...
  Wait 1 second...
Device: 
08001000:08020000
Downloading the program
start 08001000
end   0801b000
Protecting device
Finish flashing
Resetting device
Update procedure finished
jas@kaka:~/src/gnuk/tool$

Now finally, I wish you all a Merry Christmas and Happy Hacking!

Review: Contact, by Carl Sagan

Publisher:	Pocket Books
Copyright:	1985
Printing:	October 1986
ISBN:	0-671-43422-5
Format:	Mass market
Pages:	434

Contact is a standalone first-contact science fiction novel. Carl Sagan (1934 1996) was best known as a non-fiction writer, astronomer, and narrator of the PBS non-fiction program Cosmos. This is his first and only novel. Ellie Arroway is the director of Project Argus, a radio telescope array in the New Mexico desert whose primary mission is SETI: the search for extra-terrestrial intelligence by scanning the skies for unexpected radio signals. Its assignment to SETI is controversial; there are radio astronomy projects waiting, and although 25% of the telescope time is assigned to non-SETI projects, some astronomers think the SETI mission should be scrapped and Argus fully diverted to more useful research. That changes overnight when Argus picks up a signal from Vega, binary pulses representing the sequence of prime numbers. The signal of course doesn't stop when the Earth rotates, so Ellie and her team quickly notify every radio observatory they can get hold of to follow the signal as it passes out of their line of sight. Before long, nearly every country with a radio telescope is involved, and Russian help is particularly vital since they have ship-mounted equipment. The US military and intelligence establishment isn't happy about this and make a few attempts to shove the genie back into the bottle and keep any further discoveries secret, without a lot of success. (Sagan did not anticipate the end of the Cold War, and yet ironically relations with the Russians in his version of the 1990s are warmer by far than they are today. Not that this makes the military types any happier.) For better or worse, making sense of the alien signal becomes a global project. You may be familiar with this book through its 1997 movie adaptation starring Jodie Foster. What I didn't know before reading this book is that it started life as a movie treatment, co-written with Ann Druyan, in 1979. When the movie stalled, Sagan expanded it into a novel. (Given the thanks to Druyan in the author's note, it may not be far wrong to name her as a co-author.) If you've seen the movie, you will have a good idea of what will happen, but the book gives the project a more realistic international scope. Ellie has colleagues carefully selected from all over the world, including for the climactic moment of the story. The biggest problem with Contact as a novel is that Sagan is a non-fiction writer who didn't really know how to write a novel. The long, detailed descriptions of the science and the astronomical equipment fit a certain type of SF story, but the descriptions of the characters, even Ellie, are equally detailed and yet use the same style. The book starts with an account of Ellie's childhood and path into science written like a biography or a magazine profile, not like a novel in which she's the protagonist. The same is true of the other characters: we get characterization of a sort, but the tone ranges from Wikipedia article to long-form essay and never quite feels like a story. Ellie is the most interesting character in the book, partly because the way Sagan writes her is both distant but oddly compelling. Sagan (or perhaps Druyan?) tries hard to show what life is like for a woman born in the middle of the 20th century who is interested in science and I think mostly succeeds, although Ellie's reactions to sexism felt weirdly emotionless. The descriptions of her relationships are even odder and the parts where this book felt the least like a novel, but Sagan does sell some of that tone as reflective of Ellie's personality. She's a scientist, the work is the center of her life, and everything else, even when important, is secondary. It doesn't entirely make the writing style work, but it helps. Sagan does attempt to give Ellie a personal development arc related to her childhood and her relationships with her father and step-father. I thought the conclusion to that was neither believable nor anywhere near as important as Sagan thought it was, which was off-putting. Better were her ongoing arguments with evangelical Christians, one of whom is a close-minded ass and the other of which is a far more interesting character. They felt wedged into this book, and I'm dubious a realistic version of Ellie would have been the person to have those debates, but it's a subject Sagan clearly has deep personal interest in and that shows in how they're written. The other problem with Contact as a novel is that Sagan does not take science fiction seriously as a genre, instead treating it as a way to explore a thought experiment. To a science fiction reader, or at least to this science fiction reader, the interesting bits of this story involve the aliens. Those are not the bits Sagan is interested in. His attention is on how this sort of contact, and project, would affect humanity and human politics. We do get some more traditional science fiction near the end of the book, but Sagan then immediately backs away from it, minimizes that part of the story, and focuses exclusively on the emotional and philosophical implications for humans of his thought experiment. Since I found his philosophical musings about agnosticism and wonder and discovery less interesting than the actual science fiction bits, I found this somewhat annoying. The ending felt a bit more like a cheap trick than a satisfying conclusion. Interestingly, this entire novel is set in an alternate universe, for reasons entirely unexplained (at least that I noticed) in the book. It's set in the late 1990s but was written in 1985, so of course this is an alternate future, but the 1985 of this world still isn't ours. Yuri Gagarin was the first man to set foot on the moon, and the space program and the Cold War developed in subtly different ways. I'm not sure why Sagan made that choice, but it felt to me like he was separating his thought experiment farther from our world to give the ending more plausible deniability. There are, at the time of the novel, permanent orbital colonies for (mostly) rich people, because living in space turns out to greatly extend human lifespans. That gives Sagan an opportunity to wax poetic about the life-altering effects of seeing Earth from space, which in his alternate timeline rapidly sped up nuclear disarmament and made the rich more protective of the planet. This is an old bit of space boosterism that isn't as common these days, mostly because it's become abundantly clear that human psychology doesn't work this way. Sadly, rich sociopaths remain sociopaths even when you send them into space. I was a bit torn between finding Sagan's predictions charmingly hopeful and annoyingly daft. I don't think this novel is very successful as a novel. It's much longer than it needs to be and long parts of it drag. But it's still oddly readable; even knowing the rough shape of the ending in advance, I found it hard to put down once the plot properly kicks into gear about two-thirds of the way through. There's a lot in here that I'd argue with Sagan about, but he's thoughtful and makes a serious attempt to work out the political and scientific ramifications of such a discovery in detail. Despite the dry tone, he does a surprisingly good job capturing the anticipation and excitement of a very expensive and risky scientific experiment. I'm not sure I would recommend this book to anyone, but I'm also the person who found Gregory Benford's Timescape to be boring and tedious, despite its rave reviews as a science fiction novel about the practice of science. If that sort of book is more your jam, you may like Contact better than I did. Rating: 6 out of 10

In my tubman setup, I started using ZFS on an old server I had lying around. The machine is really old though (2011!) and it "feels" pretty slow. I want to see how much of that is ZFS and how much is the machine. Synthetic benchmarks show that ZFS may be slower than mdadm in RAID-10 or RAID-6 configuration, so I want to confirm that on a live workload: my workstation. Plus, I want easy, regular, high performance backups (with send/receive snapshots) and there's no way I'm going to use BTRFS because I find it too confusing and unreliable. So off we go.

Installation Since this is a conversion (and not a new install), our procedure is slightly different than the official documentation but otherwise it's pretty much in the same spirit: we're going to use ZFS for everything, including the root filesystem. So, install the required packages, on the current system:

apt install --yes gdisk zfs-dkms zfs zfs-initramfs zfsutils-linux

We also tell DKMS that we need to rebuild the initrd when upgrading:

echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf

Partitioning This is going to partition /dev/sdc with:

1MB MBR / BIOS legacy boot
512MB EFI boot
1GB bpool, unencrypted pool for /boot

rest of the disk for zpool, the rest of the data

 sgdisk --zap-all /dev/sdc
 sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/sdc
 sgdisk     -n2:1M:+512M   -t2:EF00 /dev/sdc
 sgdisk     -n3:0:+1G      -t3:BF01 /dev/sdc
 sgdisk     -n4:0:0        -t4:BF00 /dev/sdc

That will look something like this:

    root@curie:/home/anarcat# sgdisk -p /dev/sdc
    Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
    Model: ESD-S1C         
    Sector size (logical/physical): 512/512 bytes
    Disk identifier (GUID): [REDACTED]
    Partition table holds up to 128 entries
    Main partition table begins at sector 2 and ends at sector 33
    First usable sector is 34, last usable sector is 1953525134
    Partitions will be aligned on 16-sector boundaries
    Total free space is 14 sectors (7.0 KiB)
    Number  Start (sector)    End (sector)  Size       Code  Name
       1              48            2047   1000.0 KiB  EF02  
       2            2048         1050623   512.0 MiB   EF00  
       3         1050624         3147775   1024.0 MiB  BF01  
       4         3147776      1953525134   930.0 GiB   BF00

Unfortunately, we can't be sure of the sector size here, because the USB controller is probably lying to us about it. Normally, this smartctl command should tell us the sector size as well:

root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Black Mobile
Device Model:     WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity:    1 000 204 886 016 bytes [1,00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Above is the example of the builtin HDD drive. But the SSD device enclosed in that USB controller doesn't support SMART commands, so we can't trust that it really has 512 bytes sectors. This matters because we need to tweak the ashift value correctly. We're going to go ahead the SSD drive has the common 4KB settings, which means ashift=12. Note here that we are not creating a separate partition for swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and that issue is still not fixed upstream. Ubuntu recommends using a separate partition for swap instead. But since this is "just" a workstation, we're betting that we will not suffer from this problem, after hearing a report from another Debian developer running this setup on their workstation successfully. We do not recommend this setup though. In fact, if I were to redo this partition scheme, I would probably use LUKS encryption and setup a dedicated swap partition, as I had problems with ZFS encryption as well.

Creating pools ZFS pools are somewhat like "volume groups" if you are familiar with LVM, except they obviously also do things like RAID-10. (Even though LVM can technically also do RAID, people typically use mdadm instead.) In any case, the guide suggests creating two different pools here: one, in cleartext, for boot, and a separate, encrypted one, for the rest. Technically, the boot partition is required because the Grub bootloader only supports readonly ZFS pools, from what I understand. But I'm a little out of my depth here and just following the guide.

Boot pool creation This creates the boot pool in readonly mode with features that grub supports:

    zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O canmount=off \
        -O compression=lz4 \
        -O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
        -O mountpoint=/boot -R /mnt \
        bpool /dev/sdc3

I haven't investigated all those settings and just trust the upstream guide on the above.

Main pool creation This is a more typical pool creation.

    zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool /dev/sdc4

Breaking this down:

-o ashift=12: mentioned above, 4k sector size
-O encryption=on -O keylocation=prompt -O keyformat=passphrase: encryption, prompt for a password, default algorithm is aes-256-gcm, explicit in the guide, made implicit here
-O acltype=posixacl -O xattr=sa: enable ACLs, with better performance (not enabled by default)
-O dnodesize=auto: related to extended attributes, less compatibility with other implementations
-O compression=zstd: enable zstd compression, can be disabled/enabled by dataset to with zfs set compression=off rpool/example
-O relatime=on: classic atime optimisation, another that could be used on a busy server is atime=off
-O canmount=off: do not make the pool mount automatically with mount -a?
-O mountpoint=/ -R /mnt: mount pool on / in the future, but /mnt for now

Those settings are all available in zfsprops(8). Other flags are defined in zpool-create(8). The reasoning behind them is also explained in the upstream guide and some also in [the Debian wiki][]. Those flags were actually not used:

-O normalization=formD: normalize file names on comparisons (not storage), implies utf8only=on, which is a bad idea (and effectively meant my first sync failed to copy some files, including this folder from a supysonic checkout). and this cannot be changed after the filesystem is created. bad, bad, bad.

[the Debian wiki]: https://wiki.debian.org/ZFS#Advanced_Topics

Side note about single-disk pools Also note that we're living dangerously here: single-disk ZFS pools are rumoured to be more dangerous than not running ZFS at all. The choice quote from this article is:
[...] any error can be detected, but cannot be corrected. This sounds like an acceptable compromise, but its actually not. The reason its not is that ZFS' metadata cannot be allowed to be corrupted. If it is it is likely the zpool will be impossible to mount (and will probably crash the system once the corruption is found). So a couple of bad sectors in the right place will mean that all data on the zpool will be lost. Not some, all. Also there's no ZFS recovery tools, so you cannot recover any data on the drives.
Compared with (say) ext4, where a single disk error can recovered, this is pretty bad. But we are ready to live with this with the idea that we'll have hourly offline snapshots that we can easily recover from. It's trade-off. Also, we're running this on a NVMe/M.2 drive which typically just blinks out of existence completely, and doesn't "bit rot" the way a HDD would. Also, the FreeBSD handbook quick start doesn't have any warnings about their first example, which is with a single disk. So I am reassured at least.

Creating mount points Next we create the actual filesystems, known as "datasets" which are the things that get mounted on mountpoint and hold the actual files.

this creates two containers, for ROOT and BOOT
```
 zfs create -o canmount=off -o mountpoint=none rpool/ROOT &&
 zfs create -o canmount=off -o mountpoint=none bpool/BOOT
```
Note that it's unclear to me why those datasets are necessary, but they seem common practice, also used in this FreeBSD example. The OpenZFS guide mentions the Solaris upgrades and Ubuntu's zsys that use that container for upgrades and rollbacks. This blog post seems to explain a bit the layout behind the installer.
this creates the actual boot and root filesystems:
```
 zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian &&
 zfs mount rpool/ROOT/debian &&
 zfs create -o mountpoint=/boot bpool/BOOT/debian
```
I guess the debian name here is because we could technically have multiple operating systems with the same underlying datasets.

then the main datasets:

 zfs create                                 rpool/home &&
 zfs create -o mountpoint=/root             rpool/home/root &&
 chmod 700 /mnt/root &&
 zfs create                                 rpool/var

exclude temporary files from snapshots:

 zfs create -o com.sun:auto-snapshot=false  rpool/var/cache &&
 zfs create -o com.sun:auto-snapshot=false  rpool/var/tmp &&
 chmod 1777 /mnt/var/tmp

and skip automatic snapshots in Docker:
```
 zfs create -o canmount=off                 rpool/var/lib &&
 zfs create -o com.sun:auto-snapshot=false  rpool/var/lib/docker
```
Notice here a peculiarity: we must create rpool/var/lib to create rpool/var/lib/docker otherwise we get this error:
```
 cannot create 'rpool/var/lib/docker': parent does not exist
```
... and no, just creating /mnt/var/lib doesn't fix that problem. In fact, it makes things even more confusing because an existing directory shadows a mountpoint, which is the opposite of how things normally work. Also note that you will probably need to change storage driver in Docker, see the zfs-driver documentation for details but, basically, I did:
```
echo '  "storage-driver": "zfs"  ' > /etc/docker/daemon.json
```
Note that podman has the same problem (and similar solution):
```
printf '[storage]\ndriver = "zfs"\n' > /etc/containers/storage.conf
```

make a tmpfs for /run:

 mkdir /mnt/run &&
 mount -t tmpfs tmpfs /mnt/run &&
 mkdir /mnt/run/lock

We don't create a /srv, as that's the HDD stuff. Also mount the EFI partition:

mkfs.fat -F 32 /dev/sdc2 &&
mount /dev/sdc2 /mnt/boot/efi/

At this point, everything should be mounted in /mnt. It should look like this:

root@curie:~# LANG=C df -h -t zfs -t vfat
Filesystem            Size  Used Avail Use% Mounted on
rpool/ROOT/debian     899G  384K  899G   1% /mnt
bpool/BOOT/debian     832M  123M  709M  15% /mnt/boot
rpool/home            899G  256K  899G   1% /mnt/home
rpool/home/root       899G  256K  899G   1% /mnt/root
rpool/var             899G  384K  899G   1% /mnt/var
rpool/var/cache       899G  256K  899G   1% /mnt/var/cache
rpool/var/tmp         899G  256K  899G   1% /mnt/var/tmp
rpool/var/lib/docker  899G  256K  899G   1% /mnt/var/lib/docker
/dev/sdc2             511M  4.0K  511M   1% /mnt/boot/efi

Now that we have everything setup and mounted, let's copy all files over.

Copying files This is a list of all the mounted filesystems

for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done

You can check that the list is correct with:

mount -l -t ext4,btrfs,vfat   awk ' print $3 '

Note that we skip /srv as it's on a different disk. On the first run, we had:

root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
     16,831,437 100%  184.14MB/s    0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
 28,019,293,280  94%   47.63MB/s    0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,267,990  98%   50.71MB/s    0:10:40 (xfr#736577, to-chk=0/867732)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
 24,456,268,098  98%   68.03MB/s    0:05:42 (xfr#159867, ir-chk=6875/172377) 
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125  93%   74.82MB/s    0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978  96%   76.82MB/s    1:05:15 (xfr#2256473, to-chk=0/2993950)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

Note the failure to transfer that supysonic file? It turns out they had a weird filename in their source tree, since then removed, but still it showed how the utf8only feature might not be such a bad idea. At this point, the procedure was restarted all the way back to "Creating pools", after unmounting all ZFS filesystems (

umount
/mnt/run /mnt/boot/efi && umount -t zfs -a

) and destroying the pool, which, surprisingly, doesn't require any confirmation (

zpool destroy
rpool

). The second run was cleaner:

root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
        echo "syncing $fs to /mnt$fs..." && 
        rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
    done
syncing /boot/ to /mnt/boot/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/299)  
syncing /boot/efi/ to /mnt/boot/efi/...
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/110)  
syncing / to /mnt/...
 28,019,033,070  97%   42.03MB/s    0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
 34,081,807,102  98%   44.84MB/s    0:12:04 (xfr#736580, to-chk=0/867723)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
 24,043,086,450  96%   62.03MB/s    0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507  96%   67.09MB/s    1:14:43 (xfr#2256845, to-chk=0/2994364)    
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

Also note the transfer speed: we seem capped at 76MB/s, or 608Mbit/s. This is not as fast as I was expecting: the USB connection seems to be at around 5Gbps:

anarcat@curie:~$ lsusb -tv   head -4
/:  Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
    ID 1d6b:0003 Linux Foundation 3.0 root hub
     __ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
        ID 0b05:1932 ASUSTek Computer, Inc.

So it shouldn't cap at that speed. It's possible the USB adapter is failing to give me the full speed though. It's not the M.2 SSD drive either, as that has a ~500MB/s bandwidth, acccording to its spec. At this point, we're about ready to do the final configuration. We drop to single user mode and do the rest of the procedure. That used to be shutdown now, but it seems like the systemd switch broke that, so now you can reboot into grub and pick the "recovery" option. Alternatively, you might try systemctl rescue, as I found out. I also wanted to copy the drive over to another new NVMe drive, but that failed: it looks like the USB controller I have doesn't work with older, non-NVME drives.

Boot configuration Now we need to enter the new system to rebuild the boot loader and initrd and so on. First, we bind mounts and chroot into the ZFS disk:

mount --rbind /dev  /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys  /mnt/sys &&
chroot /mnt /bin/bash

Next we add an extra service that imports the bpool on boot, to make sure it survives a zpool.cache destruction:

cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
[Install]
WantedBy=zfs-import.target
EOF

Enable the service:

systemctl enable zfs-import-bpool.service

I had to trim down /etc/fstab and /etc/crypttab to only contain references to the legacy filesystems (/srv is still BTRFS!). If we don't already have a tmpfs defined in /etc/fstab:

ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/ &&
systemctl enable tmp.mount

Rebuild boot loader with support for ZFS, but also to workaround GRUB's missing zpool-features support:

grub-probe /boot   grep -q zfs &&
update-initramfs -c -k all &&
sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub &&
update-grub

For good measure, make sure the right disk is configured here, for example you might want to tag both drives in a RAID array:

dpkg-reconfigure grub-pc

Install grub to EFI while you're there:

grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck --no-floppy

Filesystem mount ordering. The rationale here in the OpenZFS guide is a little strange, but I don't dare ignore that.

mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F &

Verify that zed updated the cache by making sure these are not empty:

cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool

Once the files have data, stop zed:

fg
Press Ctrl-C.

Fix the paths to eliminate /mnt:

sed -Ei "s /mnt/? / " /etc/zfs/zfs-list.cache/*

Snapshot initial install:

zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install

Exit chroot:

exit

Finalizing One last sync was done in rescue mode:

for fs in /boot/ /boot/efi/ / /home/; do
    echo "syncing $fs to /mnt$fs..." && 
    rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done

Then we unmount all filesystems:

mount   grep -v zfs   tac   awk '/\/mnt/  print $3 '   xargs -i  umount -lf  
zpool export -a

Reboot, swap the drives, and boot in ZFS. Hurray!

Benchmarks This is a test that was ran in single-user mode using fio and the Ars Technica recommended tests, which are:

Single 4KiB random write process:

 fio --name=randwrite4k1x --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1

16 parallel 64KiB random write processes:

 fio --name=randwrite64k16x --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1

Single 1MiB random write process:

 fio --name=randwrite1m1x --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1

Strangely, that's not exactly what the author, Jim Salter, did in his actual test bench used in the ZFS benchmarking article. The first thing is there's no read test at all, which is already pretty strange. But also it doesn't include stuff like dropping caches or repeating results. So here's my variation, which i called fio-ars-bench.sh for now. It just batches a bunch of fio tests, one by one, 60 seconds each. It should take about 12 minutes to run, as there are 3 pair of tests, read/write, with and without async. My bias, before building, running and analysing those results is that ZFS should outperform the traditional stack on writes, but possibly not on reads. It's also possible it outperforms it on both, because it's a newer drive. A new test might be possible with a new external USB drive as well, although I doubt I will find the time to do this.

Results All tests were done on WD blue SN550 drives, which claims to be able to push 2400MB/s read and 1750MB/s write. An extra drive was bought to move the LVM setup from a WDC WDS500G1B0B-00AS40 SSD, a WD blue M.2 2280 SSD that was at least 5 years old, spec'd at 560MB/s read, 530MB/s write. Benchmarks were done on the M.2 SSD drive but discarded so that the drive difference is not a factor in the test. In practice, I'm going to assume we'll never reach those numbers because we're not actually NVMe (this is an old workstation!) so the bottleneck isn't the disk itself. For our purposes, it might still give us useful results.

Rescue test, LUKS/LVM/ext4 Those tests were performed with everything shutdown, after either entering the system in rescue mode, or by reaching that target with:

systemctl rescue

The network might have been started before or after the test as well:

systemctl start systemd-networkd

So it should be fairly reliable as basically nothing else is running. Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and manually merged:

test	read I/O	read IOPS	write I/O	write IOPS
rand4k4g1x	39.27	10052	212.15	54310
rand4k4g1x--fsync=1	39.29	10057	2.73	699
rand64k256m16x	1297.00	20751	1068.57	17097
rand64k256m16x--fsync=1	1290.90	20654	353.82	5661
rand1m16g1x	315.15	315	563.77	563
rand1m16g1x--fsync=1	345.88	345	157.01	157

Peaks are at about 20k IOPS and ~1.3GiB/s read, 1GiB/s write in the 64KB blocks with 16 jobs. Slowest is the random 4k block sync write at an abysmal 3MB/s and 700 IOPS The 1MB read/write tests have lower IOPS, but that is expected.

Rescue test, ZFS This test was also performed in rescue mode. Raw numbers, from the ?job-curie-zfs.log, converted to MiB/s and manually merged:

test read I/O read IOPS write I/O write IOPS

rand4k4g1x 77.20 19763 27.13 6944

rand4k4g1x--fsync=1 76.16 19495 6.53 1673

rand64k256m16x 1882.40 30118 70.58 1129

rand64k256m16x--fsync=1 1865.13 29842 71.98 1151

rand1m16g1x 921.62 921 102.21 102

rand1m16g1x--fsync=1 908.37 908 64.30 64

Peaks are at 1.8GiB/s read, also in the 64k job like above, but much faster. The write is, as expected, much slower at 70MiB/s (compared to 1GiB/s!), but it should be noted the sync write doesn't degrade performance compared to async writes (although it's still below the LVM 300MB/s).

test	read I/O	read IOPS	write I/O	write IOPS
rand4k4g1x	77.20	19763	27.13	6944
rand4k4g1x--fsync=1	76.16	19495	6.53	1673
rand64k256m16x	1882.40	30118	70.58	1129
rand64k256m16x--fsync=1	1865.13	29842	71.98	1151
rand1m16g1x	921.62	921	102.21	102
rand1m16g1x--fsync=1	908.37	908	64.30	64

Conclusions Really, ZFS has trouble performing in all write conditions. The random 4k sync write test is the only place where ZFS outperforms LVM in writes, and barely (7MiB/s vs 3MiB/s). Everywhere else, writes are much slower, sometimes by an order of magnitude. And before some ZFS zealot jumps in talking about the SLOG or some other cache that could be added to improved performance, I'll remind you that those numbers are on a bare bones NVMe drive, pretty much as fast storage as you can find on this machine. Adding another NVMe drive as a cache probably will not improve write performance here. Still, those are very different results than the tests performed by Salter which shows ZFS beating traditional configurations in all categories but uncached 4k reads (not writes!). That said, those tests are very different from the tests I performed here, where I test writes on a single disk, not a RAID array, which might explain the discrepancy. Also, note that neither LVM or ZFS manage to reach the 2400MB/s read and 1750MB/s write performance specification. ZFS does manage to reach 82% of the read performance (1973MB/s) and LVM 64% of the write performance (1120MB/s). LVM hits 57% of the read performance and ZFS hits barely 6% of the write performance. Overall, I'm a bit disappointed in the ZFS write performance here, I must say. Maybe I need to tweak the record size or some other ZFS voodoo, but I'll note that I didn't have to do any such configuration on the other side to kick ZFS in the pants...

Real world experience This section document not synthetic backups, but actual real world workloads, comparing before and after I switched my workstation to ZFS.

Docker performance I had the feeling that running some git hook (which was firing a Docker container) was "slower" somehow. It seems that, at runtime, ZFS backends are significant slower than their overlayfs/ext4 equivalent:

May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.

Translating this:

container setup: ~1 second
container runtime: ~1 second
container teardown: ~1 second
total runtime: 2-3 seconds

Obviously, those timestamps are not quite accurate enough to make precise measurements... After I switched to ZFS:

mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded. 
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080 
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded. 
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142. 
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded. 
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete" 
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded. 
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded. 
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.

That's double or triple the run time, from 2 seconds to 6 seconds. Most of the time is spent in run time, inside the container. Here's the breakdown:

container setup: ~2 seconds
container run: ~4 seconds
container teardown: ~1 second
total run time: about ~6-7 seconds

That's a two- to three-fold increase! Clearly something is going on here that I should tweak. It's possible that code path is less optimized in Docker. I also worry about podman, but apparently it also supports ZFS backends. Possibly it would perform better, but at this stage I wouldn't have a good comparison: maybe it would have performed better on non-ZFS as well...

Interactivity While doing the offsite backups (below), the system became somewhat "sluggish". I felt everything was slow, and I estimate it introduced ~50ms latency in any input device. Arguably, those are all USB and the external drive was connected through USB, but I suspect the ZFS drivers are not as well tuned with the scheduler as the regular filesystem drivers...

Recovery procedures For test purposes, I unmounted all systems during the procedure:

umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a

And disconnected the drive, to see how I would recover this system from another Linux system in case of a total motherboard failure. To import an existing pool, plug the device, then import the pool with an alternate root, so it doesn't mount over your existing filesystems, then you mount the root filesystem and all the others:

zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock

Offsite backup Part of the goal of using ZFS is to simplify and harden backups. I wanted to experiment with shorter recovery times specifically both point in time recovery objective and recovery time objective and faster incremental backups. This is, therefore, part of my backup services. This section documents how an external NVMe enclosure was setup in a pool to mirror the datasets from my workstation. The final setup should include syncoid copying datasets to the backup server regularly, but I haven't finished that configuration yet.

Partitioning The above partitioning procedure used sgdisk, but I couldn't figure out how to do this with sgdisk, so this uses sfdisk to dump the partition from the first disk to an external, identical drive:

sfdisk -d /dev/nvme0n1   sfdisk --no-reread /dev/sda --force

Pool creation This is similar to the main pool creation, except we tweaked a few bits after changing the upstream procedure:

zpool create \
        -o cachefile=/etc/zfs/zpool.cache \
        -o ashift=12 -d \
        -o feature@async_destroy=enabled \
        -o feature@bookmarks=enabled \
        -o feature@embedded_data=enabled \
        -o feature@empty_bpobj=enabled \
        -o feature@enabled_txg=enabled \
        -o feature@extensible_dataset=enabled \
        -o feature@filesystem_limits=enabled \
        -o feature@hole_birth=enabled \
        -o feature@large_blocks=enabled \
        -o feature@lz4_compress=enabled \
        -o feature@spacemap_histogram=enabled \
        -o feature@zpool_checkpoint=enabled \
        -O acltype=posixacl -O xattr=sa \
        -O compression=lz4 \
        -O devices=off \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/boot -R /mnt \
        bpool-tubman /dev/sdb3

The change from the main boot pool are:

no unicode normalization
different device path (sdb used to be the M.2 device, it's now nvme0n1)
reordered parameters

Main pool creation is:

zpool create \
        -o ashift=12 \
        -O encryption=on -O keylocation=prompt -O keyformat=passphrase \
        -O acltype=posixacl -O xattr=sa -O dnodesize=auto \
        -O compression=zstd \
        -O relatime=on \
        -O canmount=off \
        -O mountpoint=/ -R /mnt \
        rpool-tubman /dev/sdb4

First sync I used syncoid to copy all pools over to the external device. syncoid is a thing that's part of the sanoid project which is specifically designed to sync snapshots between pool, typically over SSH links but it can also operate locally. The sanoid command had a --readonly argument to simulate changes, but syncoid didn't so I tried to fix that with an upstream PR. It seems it would be better to do this by hand, but this was much easier. The full first sync was:

root@curie:/home/anarcat# ./bin/syncoid -r  bpool bpool-tubman
CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create bpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%            
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================>                                                         ] 53%            
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
 126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
 113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%
root@curie:/home/anarcat# ./bin/syncoid -r  rpool rpool-tubman
CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
                Replication to target would require destroying existing
                target. Cowardly refusing to destroy your existing target.
          NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
                 zfs create rpool-tubman  on the target? ZFS initial
                replication must be to a NON EXISTENT DATASET, which will
                then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%            
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================>  ] 99%            
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
 277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
 627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================>                                             ] 63%            
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
 443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%            
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================>                                                                            ] 38%            
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%            
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%            
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
 219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%

Funny how the CRITICAL ERROR doesn't actually stop syncoid and it just carries on merrily doing when it's telling you it's "cowardly refusing to destroy your existing target"... Maybe that's because my pull request broke something though... During the transfer, the computer was very sluggish: everything feels like it has ~30-50ms latency extra:

anarcat@curie:sanoid$ LANG=C top -b  -n 1   head -20
top - 13:07:05 up 6 days,  4:01,  1 user,  load average: 16.13, 16.55, 11.83
Tasks: 606 total,   6 running, 598 sleeping,   0 stopped,   2 zombie
%Cpu(s): 18.8 us, 72.5 sy,  1.2 ni,  5.0 id,  1.2 wa,  0.0 hi,  1.2 si,  0.0 st
MiB Mem :  15898.4 total,   1387.6 free,  13170.0 used,   1340.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1319.8 avail Mem 
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
     70 root      20   0       0      0      0 S  83.3   0.0   6:12.67 kswapd0
4024878 root      20   0  282644  96432  10288 S  44.4   0.6   0:11.43 puppet
3896136 root      20   0   35328  16528     48 S  22.2   0.1   2:08.04 mbuffer
3896135 root      20   0   10328    776    168 R  16.7   0.0   1:22.93 zfs
3896138 root      20   0   10588    788    156 R  16.7   0.0   1:49.30 zfs
    350 root       0 -20       0      0      0 R  11.1   0.0   1:03.53 z_rd_int
    351 root       0 -20       0      0      0 S  11.1   0.0   1:04.15 z_rd_int
3896137 root      20   0    4384    352    244 R  11.1   0.0   0:44.73 pv
4034094 anarcat   30  10   20028  13960   2428 S  11.1   0.1   0:00.70 mbsync
4036539 anarcat   20   0    9604   3464   2408 R  11.1   0.0   0:00.04 top
    352 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    353 root       0 -20       0      0      0 S   5.6   0.0   1:03.64 z_rd_int
    354 root       0 -20       0      0      0 S   5.6   0.0   1:04.01 z_rd_int

I wonder how much of that is due to syncoid, particularly because I often saw mbuffer and pv in there which are not strictly necessary to do those kind of operations, as far as I understand. Once that's done, export the pools to disconnect the drive:

zpool export bpool-tubman
zpool export rpool-tubman

Raw disk benchmark Copied the 512GB SSD/M.2 device to another 1024GB NVMe/M.2 device:

anarcat@curie:~$ sudo dd if=/dev/sdb of=/dev/sdc bs=4M status=progress conv=fdatasync
499944259584 octets (500 GB, 466 GiB) copi s, 1713 s, 292 MB/s
119235+1 enregistrements lus
119235+1 enregistrements  crits
500107862016 octets (500 GB, 466 GiB) copi s, 1719,93 s, 291 MB/s

... while both over USB, whoohoo 300MB/s!

Monitoring ZFS should be monitoring your pools regularly. Normally, the [[!debman zed]] daemon monitors all ZFS events. It is the thing that will report when a scrub failed, for example. See this configuration guide. Scrubs should be regularly scheduled to ensure consistency of the pool. This can be done in newer zfsutils-linux versions (bullseye-backports or bookworm) with one of those, depending on the desired frequency:

systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@rpool.timer --now

When the scrub runs, if it finds anything it will send an event which will get picked up by the zed daemon which will then send a notification, see below for an example. TODO: deploy on curie, if possible (probably not because no RAID) TODO: this should be in Puppet

Scrub warning example So what happens when problems are found? Here's an example of how I dealt with an error I received. After setting up another server (tubman) with ZFS, I eventually ended up getting a warning from the ZFS toolchain.

Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman
ZFS has finished a scrub:
   eid: 39536
 class: scrub_finish
  host: tubman
  time: 2022-10-09 00:58:07-0400
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct  9 00:58:07 2022
config:
        NAME        STATE     READ WRITE CKSUM
        rpool       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb4    ONLINE       0     1     0
            sdc4    ONLINE       0     0     0
        cache
          sda3      ONLINE       0     0     0
errors: No known data errors

This, in itself, is a little worrisome. But it helpfully links to this more detailed documentation (and props up there: the link still works) which explains this is a "minor" problem (something that could be included in the report). In this case, this happened on a server setup on 2021-04-28, but the disks and server hardware are much older. The server itself (marcos v1) was built around 2011, over 10 years ago now. The hard drive in question is:

root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Some more SMART stats:

root@tubman:~# smartctl -a -qnoserial /dev/sdb   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       12464 (206 202 0)
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       10966h+55m+23.757s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       21107792664
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3201579750

That's over a year of power on, which shouldn't be so bad. It has written about 10TB of data (21107792664 LBAs * 512 byte/LBA), which is about two full writes. According to its specification, this device is supposed to support 55 TB/year of writes, so we're far below spec. Note that are still far from the "non-recoverable read error per bits" spec (1 per 10E15), as we've basically read 13E12 bits (3201579750 LBAs * 512 byte/LBA = 13E12 bits). It's likely this disk was made in 2018, so it is in its fourth year. Interestingly, /dev/sdc is also a Seagate drive, but of a different series:

root@tubman:~# smartctl -qnoserial  -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5
Device Model:     ST4000DM004-2CV104
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

It has seen much more reads than the other disk which is also interesting:

root@tubman:~# smartctl -a -qnoserial /dev/sdc   grep -e  Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes:     512 bytes logical, 4096 bytes physical
  9 Power_On_Hours          0x0032   059   059   000    Old_age   Always       -       36240
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       33994h+10m+52.118s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       30730174438
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       51894566538

That's 4 years of Head_Flying_Hours, and over 4 years (4 years and 48 days) of Power_On_Hours. The copyright date on that drive's specs goes back to 2016, so it's a much older drive. SMART self-test succeeded.

Remaining issues

TODO: move send/receive backups to offsite host, see also zfs for alternatives to syncoid/sanoid there

TODO: setup backup cron job (or timer?)

TODO: swap still not setup on curie, see zfs

TODO: document this somewhere: bpool and rpool are both pools and datasets. that's pretty confusing, but also very useful because it allows for pool-wide recursive snapshots, which are used for the backup system

fio improvements I really want to improve my experience with fio. Right now, I'm just cargo-culting stuff from other folks and I don't really like it. stressant is a good example of my struggles, in the sense that it doesn't really work that well for disk tests. I would love to have just a single .fio job file that lists multiple jobs to run serially. For example, this file describes the above workload pretty well:

[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1
# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1
[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1
# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16
[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16
# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1
[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1

... except the jobs are actually started in parallel, even though they are stonewall'd, as far as I can tell by the reports. I sent a mail to the fio mailing list for clarification. It looks like the jobs are started in parallel, but actual (correctly) run serially. It seems like this might just be a matter of reporting the right timestamps in the end, although it does feel like starting all the processes (even if not doing any work yet) could skew the results.

Hangs during procedure During the procedure, it happened a few times where any ZFS command would completely hang. It seems that using an external USB drive to sync stuff didn't work so well: sometimes it would reconnect under a different device (from sdc to sdd, for example), and this would greatly confuse ZFS. Here, for example, is sdd reappearing out of the blue:

May 19 11:22:53 curie kernel: [  699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [  699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [  699.922433] scsi 4:0:0:0: Direct-Access     ROG      ESD-S1C          0    PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [  699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [  699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [  699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [  699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [  699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [  699.961602]  sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [  699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk

Next time I run a ZFS command (say zpool list), the command completely hangs (D state) and this comes up in the logs:

May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648] 
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync        state:D stack:    0 pid:  997 ppid:     2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670]  schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675]  ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678]  io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689]  __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702]  __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816]  zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929]  dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032]  spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138]  ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245]  txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354]  ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366]  thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376]  ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379]  kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382]  ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386]  ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412]  ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420]  schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424]  __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435]  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537]  spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644]  zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool           state:D stack:    0 pid:11815 ppid:  3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988]  __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992]  schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995]  io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004]  cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008]  ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118]  txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223]  txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325]  spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430]  ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537]  zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543]  ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644]  zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649]  __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653]  do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40

Here's another example, where you see the USB controller bleeping out and back into existence:

mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed             state:D stack:    0 pid: 1564 ppid:     1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel:  __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel:  ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel:  spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel:  zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel:       Tainted: P          IOE     5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool           state:D stack:    0 pid:11815 ppid:  2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel:  __schedule+0x282/0x870
mai 19 11:39:25 curie kernel:  schedule+0x46/0xb0
mai 19 11:39:25 curie kernel:  io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel:  cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel:  ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel:  txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel:  txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel:  spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel:  ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel:  zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel:  ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel:  zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel:  __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel:  do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40

I understand those are rather extreme conditions: I would fully expect the pool to stop working if the underlying drives disappear. What doesn't seem acceptable is that a command would completely hang like this.

References See the zfs documentation for more information about ZFS, and tubman for another installation and migration procedure.

I m trying to replace my old OpenPGP key with a new one. The old key wasn t compromised or lost or anything bad. Is still valid, but I plan to get rid of it soon. It was created in 2013. The new key id fingerprint is: AA66280D4EF0BFCC6BFC2104DA5ECB231C8F04C4 I plan to use the new key for things like encrypted emails, uploads to the Debian archive, and more. Also, the new key includes an identity with a newer personal email address I plan to use soon: arturo.bg@arturo.bg The new key has been uploaded to some public keyservers. If you would like to sign the new key, please follow the steps in the Debian wiki.

If you are curious about what that long code block contains, check this https://cirw.in/gpg-decoder/ For the record, the old key fingerprint is: DD9861AB23DC3333892E07A968E713981D1515F8 Cheers!

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

halibut: add documentation generator metadata

mpv-mpris: fix tests

duck: obsolete site indicators

devscripts: fix example, add experimental pseudo-excuses

dpkg-repack: show dpkg-deb cmd

Debian QA services: add experimental pseudo-excuses, improve excuses page

Debian usertags: fixed ports, lintian usertags

Debian package uploads: mpv-mpris backport

Debian wiki pages: ArchitectureSpecificsMemo, DebianArt/Themes/Emerald, DebianDesktop/Artwork (Bookworm), DebianLogo, DeveloperNews, FPGA, Gnome, HomeAssistant, InstructionSelection (1 2), LTS, PackageKit, PipeWire, Salsa/support, Services, ServicesHosting, SIMDEverywhere, SimpleBackportCreation, spotify, SystemBuildTools, ThomasChung, UsingSymbolsFiles, ZRam, Zswap

Libreplanet wiki pages: Group:Free_Software_Foundation/Ideas

FOSSjobs wiki pages: Resources

Wikipedia: Message-ID

Issues

Conffile removal in dkms

Features in UDD, virtual-fido

Security hardening in open-in-mpv

Packaging needed for prr, libredirect, wayback-machine-webextension

Underlinking in libgrpc

Data import problem in UDD

HTML injection in a conversion utility (sent privately)

Review

Debian wiki: RecentChanges for the month

Debian BTS usertags: changes for the month

Debian screenshots:

approved apticron-systemd argyll chromium-browser coreutils e2fsprogs exuberant-ctags firefox gnome-flashback gnome-recipes gparted initramfs-tools kdocker kleopatra latte-dock mac-robber mail-expire mgba-qt mount parolottero-data-us plasma-widget-folderview reglookup stockfish sxmo-utils uuid-runtime yoshimi

rejected weboob-qt/quake (politics), pass (baby photo), hylafax-client/libc6/ (photo of broken screen), aptitude (screenshots website), apt-file (NSFW photo), synaesthesia (meme image), kde-full (photo of human), ubuntustudio-menu (too small), tesseract-ocr-eng (identification card), mpv (proprietary video), g++-9/python3-ldap (random logo), synaptic (logo), openjdk-8-jre (NSFW, mobile browser), libreoffice-base-core/librtmp1/audacity (Android/iOS app), virtualbox-guest-additions-iso (ISO contents not interface), weboob-qt (another package), freecell-solver-bin (accidentally)

Administration

Debian QA services: deploy changes

Debian wiki: approve accounts

Communication

Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors All work was done on a volunteer basis.

Search Results: "daf"

18 April 2024

13 April 2024

11 April 2024

13 February 2024

+1, Helpful If you ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.

11 January 2024

What s missing?

Automated GUI testing / accessibility / automation Automation of GUI tasks is a powerful feature, so is the ability to auto-test GUIs. This is being worked on, with libei and wlheadless-run (and stuff like ydotool exists too), but we re not fully there yet.

10 January 2024

9 January 2024

7 January 2024

Conferences The only Free Software related conference I made it to this year was DebConf23 in Kochi, India. Changes with projects at work meant I couldn t justify anything work related. This year I m planning to make it to FOSDEM, and haven t made a decision on DebConf24 yet.

19 December 2023

12 November 2023

22 October 2023

8 September 2023

On our mailing list Carles Pina i Estany wrote to our mailing list during August with an interesting question concerning the practical steps to reproduce the hello-traditional package from Debian. The entire thread can be viewed from the archive page, as can Vagrant Cascadian s reply.

Website updates Rahul Bajaj updated our website to add a series of environment variations related to reproducible builds [ ], Russ Cox added the Go programming language to our projects page [ ] and Vagrant Cascadian fixed a number of broken links and typos around the website [ ][ ][ ].

9 August 2023

6 April 2023

19 January 2023

24 December 2022

14 December 2022

16 November 2022

Real world experience This section document not synthetic backups, but actual real world workloads, comparing before and after I switched my workstation to ZFS.

Partitioning The above partitioning procedure used sgdisk, but I couldn't figure out how to do this with sgdisk, so this uses sfdisk to dump the partition from the first disk to an external, identical drive: sfdisk -d /dev/nvme0n1 sfdisk --no-reread /dev/sda --force

References See the zfs documentation for more information about ZFS, and tubman for another installation and migration procedure.

3 November 2022

3 October 2022

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Issues Conffile removal in dkms Features in UDD, virtual-fido Security hardening in open-in-mpv Packaging needed for prr, libredirect, wayback-machine-webextension Underlinking in libgrpc Data import problem in UDD HTML injection in a conversion utility (sent privately)

Administration Debian QA services: deploy changes Debian wiki: approve accounts

Communication Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors All work was done on a volunteer basis.

On our mailing list Carles Pina i Estany wrote to our mailing list during August with an interesting question concerning the practical steps to reproduce the `hello-traditional` package from Debian. The entire thread can be viewed from the archive page, as can Vagrant Cascadian s reply.

Partitioning The above partitioning procedure used `sgdisk`, but I couldn't figure out how to do this with `sgdisk`, so this uses `sfdisk` to dump the partition from the first disk to an external, identical drive:
`sfdisk -d /dev/nvme0n1 sfdisk --no-reread /dev/sda --force`

Issues

Conffile removal in dkms

Features in UDD, virtual-fido

Security hardening in open-in-mpv

Packaging needed for prr, libredirect, wayback-machine-webextension

Underlinking in libgrpc

Data import problem in UDD

HTML injection in a conversion utility (sent privately)

Administration

Debian QA services: deploy changes

Debian wiki: approve accounts

Communication

Respond to queries from Debian users and contributors on the mailing lists and IRC